IR Tools

  1. Scrapy
  2. Apache Lucene
  3. Apache Solr
  4. Elastic Search
  5. NLTK
  6. DARPA’s Memex Tools
  7. Scrapy Cluster
  8. Portia
  9. slate
  10. pdfminer
  11. pdfrw
  12. OpenRefine
  13. News Corpus Builder
  14. Google All Pair Similarity Search
  15. Wukong - Chineese Search Engine in Go
  16. YAGO
  17. [NELL])(
  18. Tweet NLP
  19. GATE
  20. CRF Suite
  21. jusText
  22. A curated list of speech and natural language processing resources
  23. entity-recognition
  24. The CMU-Cambridge Statistical Language Modeling Toolkit v2


  1. Stanford Course
  2. Information Integration
  3. Information Extraction
  4. Natural Language Procesing
  5. Stanford: Natural Language Processing
  6. Machine Tranlation Class
  7. Natural Language Generation
  8. Data Mining and Text Mining
  9. Networks
  10. Social Media Analytics
  11. Natural Language Understanding
  12. MOOC NLP Slides
  13. Advanced NLP

Talks & Papers

  1. Challenges in Building Large-Scale Information Retrieval Systems
  2. Tutorial: Web Information Retrieval
  3. Learning to Rank
  4. Solution for the Search Result Relevance Challenge
  5. Algorithms for Duplicate Documents
  6. HyperLogLog and MinHash
  7. Detecting Near-Duplicates for Web Crawling
  8. Bag of Words Meets Bags of Popcorn
  9. A Statistical MT Tutorial Workbook
  10. Exploting Wikipedia for IR Tasks
  11. Boilerplate detection using shallow features
  12. Briding the structured and unstructured gap: Searching Annotated Web
  13. Building blocks for semantic search engine
  14. Sentiment Symposium Tutorial
  15. Fuzzy string matching using cosine similarity
  16. Learning by Example: Training Users with High-quality Query Suggestions
  17. Analyzing User’s Sequential Behavior in Query Auto-Completion via Markov Processes
  18. Optimal insertion in deterministic DAWGs
  19. Introduction to Probabilistic Topic Models
  20. How Google Set Works
  21. LDA Beginners Tutorial
  22. Galene - LinkedIn’s Search Architecture: Presented by Diego Buthay & Sriram Sankar, LinkedIn
  23. Did you mean Galene?
  24. The Many Facets of Faceted Search
  25. Under The Hood Building Graph Search
  26. Under the Hood: Building out the infrastructure for Graph Search
  27. Under the Hood: Indexing and ranking in Graph Search
  28. SIGIR 2014 Tutorials
  29. SoMeRA 2014: International Workshop on Social Media Retrieval and Analysis
  30. Semantic Matching and Information Retrieval
  31. Gathering Assesment of Relevance
  32. Entity Recognition and Disambiguation Challenge
  33. Web Page Sectioning Using Regex-based Templates
  34. Learning and Matching Human Activities using Regular Expression
  35. RoadRunner
  36. Efficient Query Evaluation using a Two-Level Retrieval Process
  37. String Algorithms
  38. Building, Maintaining, and Using Knowledge Bases: A Report from the Trenches
  39. RoboBrain
  40. Knowledge Base Completion via Search-Based Question Answering
  41. Inside YAGO
  42. Notes on Knowledge Representation
  43. Ontology-Based Semantic Search on the Web
  44. Large Scale Name Entity Disambiguation Based on Wikipedia Data
  45. Parsing with Word Vectors
  46. Building Expert System in Prolog
  47. Using Encyclopedic Knowledge for Named Entity Disambiguation
  48. Building Taxonomy of Web Search Intents for Name Entity Queries
  49. Classifying Web Queries by Topic and User Intent
  50. Clustering Query Refinements by User Intent
  51. Object Level Vertical Resource
  52. Query Classification using Wikipedia’s Graph
  53. Active Objects: Actions for Entity-Centric Search
  54. Query Classification KDDCUP 2005
  55. Substring Search Algorithm
  56. Web Query Classification
  57. Query Enrichment for Web-query Classification
  58. Teaching Machines to Read and Comprehend
  59. Statistical Machine Translation: the basic, the novel, and the speculative
  60. Machine Translation for all European Languages
  61. GraphChi - Large-Scale Graph Computation on Just a PC
  62. Similarity Evaluation on Tree-structured Data
  63. Using Bloom Filters to Refine Web Search Results
  64. Extracting social networks and contact information from email and the Web
  65. Pizza Ontology
  66. Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval
  67. Visual Rank: Applying PageRank to Large-Scale Image Search
  68. Image Retrieval using Scene Graph
  69. Towards Total Scene Understanding
  70. Neural Machine Translation
  71. Traversing Knowledge Graphs in Vector Space
  72. A large annotated corpus for learning natural language inference
  73. Forum77: An Analysis of an Online Health Forum Dedicatedto Addiction Recovery
  74. Disaster Monitoring with Wikipedia and Online Social Networking Sites
  75. Deep Learning for NLP
  76. Making Watson Fast
  77. Navigating themes in restaurant reviews with Word Mover’s Distance
  78. A Word is Worth a Thousand Vectors
  79. FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs
  80. Scale-up Graph Processing in Single Computer
  81. Creating your own programming Language with ANTLR
  82. Short Text Similarity with word Embeddings
  83. An Inside View of Language Technologies at Google
  84. Computer, respond to this email.
  85. Sequence to Sequence Learning with Neural Network
  86. Thought Vectors
  87. Thought Vectors, Deep Learning & Future of AI
  88. Skip Thought Vectors
  89. How Google Converted Language Translation Into a Problem of Vector Space Mathematics
  90. Google Is Working On A New Type Of Algorithm Called “Thought Vectors”
  91. Representing Text for Joint Embedding of Text and Knowledge Bases


  1. IR Book
  2. Search Engines - Information Retrieval
  3. Natural Language Processing: What are the most important research papers which all NLP students should definitely read?
  4. An introduction to Bioinformatics Algorithms
  5. How many processors does a Google search query touch?
  6. Fast approximate string matching with large edit distances in Big Data
  7. Search User Interfaces
  8. Speech and Language Processing - DRAFT
  9. A Primer on Neural Network Models for Natural Language Processing


  1. Salmon Run
  2. Natural Language Processing Blog
  3. Text Mining, Analytics & More
  4. LingPipe Blog
  5. Decontextualize