Course Description
Automatic methods of Information Retrieval (IR) have gained greater significance in recent years due to the dramatic increase in the amount of data available on the Web. The data is often present in multiple forms (such as text, image, video) and hence it is necessary that the IR techniques being deployed on the web are able to perform various operations such as search and retrieval across all these different data formats. In this course, the study of IR will be focused on the methodologies of indexing, processing, and querying of primary textual data and will be extended to video and image data in the latter part of the course. The primary learning objective of the course will be – i) to gain knowledge about the basic concepts and techniques of IR.; ii) understand the basic functionality and underlying algorithm of an IR system.; iii) understand modern neural networks and deep learning-based techniques that are used in today’s’ IR systems.; and iv) to learn about several applications of e.g., question answering, image, and video retrieval; v) learn how to develop a basic IR system from scratch and evaluate the system; vi) learn classification, clustering, topic modeling which are the core modules in an IR system.
Pre-requisite
- Prior knowledge of elementary linear algebra would be helpful but is not required for this course.
- Programming knowledge preferably in Python.
- Object-orientated programming.
- Data structure.
Learning Objectives
- Gain knowledge about the basic concepts and techniques of IR.
- Understand the basic functionality and underlying algorithms of an IR system.
- Understand modern neural networks and deep learning-based techniques that are used in today’s IR systems.
- Learn about several applications of e.g., question answering, image, and video retrieval.
- Learn how to develop a basic IR system from scratch and evaluate the system.
- Learn clarification, clustering, topic modeling which are the core modules in an IR system.
Measurable Outcomes
- Identifying important concepts of Information Retrieval.
- Learn vector space modeling, modern deep learning techniques for IR and evaluation methods. Finally, utilize this knowledge to complete the project.
- Evaluate the performance of different IR models using empirical benchmarks.
- Implement different IR applications such as Question Answering, Image and Video Retrieval systems.
- Able to use libraries such as Sklearn, Keras for data processing and IR model creation.
- Mathematically explain common neural network-based models, word2vec and Glove distributed word representations for IR system building.
Topics Covered
- Boolean retrieval and VSM
- Word Embeddings
- Probabilistic IR and Relevance Feedback, BM-25
- Introduction to Neural Networks
- Text Processing and Classification using Neural Networks.
- Language Modeling — Transformers, BERT, RoBERTa
- IR using Language Modeling
- Question Answering
- Personalization
- Image Retrieval
Required Texts and Reading
Introduction to Information Retrieval, by Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze.
http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html