IR
IR Tools
- Scrapy
- Apache Lucene
- Apache Solr
- Elastic Search
- NLTK
- DARPA’s Memex Tools
- Scrapy Cluster
- Portia
- slate
- pdfminer
- pdfrw
- OpenRefine
- News Corpus Builder
- Google All Pair Similarity Search
- Wukong - Chineese Search Engine in Go
- YAGO
- [NELL])(http://rtw.ml.cmu.edu/rtw/kbbrowser/)
- Tweet NLP
- GATE
- CRF Suite
- jusText
- A curated list of speech and natural language processing resources
- entity-recognition
- The CMU-Cambridge Statistical Language Modeling Toolkit v2
Courses
- Stanford Course
- Information Integration
- Information Extraction
- Natural Language Procesing
- Stanford: Natural Language Processing
- Machine Tranlation Class
- Natural Language Generation
- Data Mining and Text Mining
- Networks
- Social Media Analytics
- Natural Language Understanding
- MOOC NLP Slides
- Advanced NLP
Talks & Papers
- Challenges in Building Large-Scale Information Retrieval Systems
- Tutorial: Web Information Retrieval
- Learning to Rank
- Solution for the Search Result Relevance Challenge
- Algorithms for Duplicate Documents
- HyperLogLog and MinHash
- Detecting Near-Duplicates for Web Crawling
- Bag of Words Meets Bags of Popcorn
- A Statistical MT Tutorial Workbook
- Exploting Wikipedia for IR Tasks
- Boilerplate detection using shallow features
- Briding the structured and unstructured gap: Searching Annotated Web
- Building blocks for semantic search engine
- Sentiment Symposium Tutorial
- Fuzzy string matching using cosine similarity
- Learning by Example: Training Users with High-quality Query Suggestions
- Analyzing User’s Sequential Behavior in Query Auto-Completion via Markov Processes
- Optimal insertion in deterministic DAWGs
- Introduction to Probabilistic Topic Models
- How Google Set Works
- LDA Beginners Tutorial
- Galene - LinkedIn’s Search Architecture: Presented by Diego Buthay & Sriram Sankar, LinkedIn
- Did you mean Galene?
- The Many Facets of Faceted Search
- Under The Hood Building Graph Search
- Under the Hood: Building out the infrastructure for Graph Search
- Under the Hood: Indexing and ranking in Graph Search
- SIGIR 2014 Tutorials
- SoMeRA 2014: International Workshop on Social Media Retrieval and Analysis
- Semantic Matching and Information Retrieval
- Gathering Assesment of Relevance
- Entity Recognition and Disambiguation Challenge
- Web Page Sectioning Using Regex-based Templates
- Learning and Matching Human Activities using Regular Expression
- RoadRunner
- Efficient Query Evaluation using a Two-Level Retrieval Process
- String Algorithms
- Building, Maintaining, and Using Knowledge Bases: A Report from the Trenches
- RoboBrain
- Knowledge Base Completion via Search-Based Question Answering
- Inside YAGO
- Notes on Knowledge Representation
- Ontology-Based Semantic Search on the Web
- Large Scale Name Entity Disambiguation Based on Wikipedia Data
- Parsing with Word Vectors
- Building Expert System in Prolog
- Using Encyclopedic Knowledge for Named Entity Disambiguation
- Building Taxonomy of Web Search Intents for Name Entity Queries
- Classifying Web Queries by Topic and User Intent
- Clustering Query Refinements by User Intent
- Object Level Vertical Resource
- Query Classification using Wikipedia’s Graph
- Active Objects: Actions for Entity-Centric Search
- Query Classification KDDCUP 2005
- Substring Search Algorithm
- Web Query Classification
- Query Enrichment for Web-query Classification
- Teaching Machines to Read and Comprehend
- Statistical Machine Translation: the basic, the novel, and the speculative
- Machine Translation for all European Languages
- GraphChi - Large-Scale Graph Computation on Just a PC
- Similarity Evaluation on Tree-structured Data
- Using Bloom Filters to Refine Web Search Results
- Extracting social networks and contact information from email and the Web
- Pizza Ontology
- Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval
- Visual Rank: Applying PageRank to Large-Scale Image Search
- Image Retrieval using Scene Graph
- Towards Total Scene Understanding
- Neural Machine Translation
- Traversing Knowledge Graphs in Vector Space
- A large annotated corpus for learning natural language inference
- Forum77: An Analysis of an Online Health Forum Dedicatedto Addiction Recovery
- Disaster Monitoring with Wikipedia and Online Social Networking Sites
- Deep Learning for NLP
- Making Watson Fast
- Navigating themes in restaurant reviews with Word Mover’s Distance
- A Word is Worth a Thousand Vectors
- FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs
- Scale-up Graph Processing in Single Computer
- Creating your own programming Language with ANTLR
- Short Text Similarity with word Embeddings
- An Inside View of Language Technologies at Google
- Computer, respond to this email.
- Sequence to Sequence Learning with Neural Network
- Thought Vectors
- Thought Vectors, Deep Learning & Future of AI
- Skip Thought Vectors
- How Google Converted Language Translation Into a Problem of Vector Space Mathematics
- Google Is Working On A New Type Of Algorithm Called “Thought Vectors”
- Representing Text for Joint Embedding of Text and Knowledge Bases
Resources
- IR Book
- Search Engines - Information Retrieval
- Natural Language Processing: What are the most important research papers which all NLP students should definitely read?
- An introduction to Bioinformatics Algorithms
- How many processors does a Google search query touch?
- Fast approximate string matching with large edit distances in Big Data
- Search User Interfaces
- Speech and Language Processing - DRAFT
- A Primer on Neural Network Models for Natural Language Processing