INF 384H: Concepts in Information Retrieval and Filtering

 

Syllabus

Course Information

Grading

Detailed Outline

Administrativa

 

Top

Course Information

Class Meeting Time and Place: Monday 9:00-12:00, SZB 464

Unique ID: 27440

Instructor: Miles Efron
Office: SZB 562E

Office Hours: Tues. 10:00-11:00
Email: miles@ischool.utexas.edu
Web: http://www.ischool.utexas.edu/~miles



Course Description: An introductory survey of information filtering and retrieval, with an emphasis on developing the student's understanding of the relationship between the algorithms used by search engines, the query and document, and system performance.  This is an information science course, not an information technology course.  The course will emphasize basic knowledge useful for those who will be in leadership positions in the information professions.

 

Purpose:  The main purpose of this course is to provide a foundation for understanding the models, assumptions, and motivations that underpin contemporary information retrieval and filtering systems:

In all of this work, we will inform our discussion with important articles and readings from the fields of Information and Library Science; another course goal is to ground students in relevant literatures.

 

Prerequisites:  There are no official prerequisites for this course. However, most of the reading and much of the graded work requires facility with statistics, probability, and mathematics (i.e. basic matrix operations). All required background will be reviewed during class, but at a necessarily quick pace. Students with aversions to quantitative reasoning may find this course frustrating.

 

Top

Class Email List

Email will be the primary method of communication outside of class.  Thus it is crucial that all enrolled students subscribe to our class listserv. Instructions for joining the list are at

 

            https://utlists.utexas.edu/sympa/

 

The name of the list is inf384hs09.

 

Graded Assignments

Assignment

Weight

Homework

25%

IR Innovation Presentation

15%

TREC Presentation

15%

Final Project Presentation (Group)

10%

Final Project Write-up (Group)

25%

Class engagement

10%

 

 

Grading Details:

I will use the following schedule in calculating final grades:

 

A+ = 100

A = 95-99

A- = 90-94

B+ = 85-89

B = 80-84

B- = 75-79

C+ = 70-74

C = 65-69

C- = 60-64

 

F = <60

 

 

A few notes on grading:

  1. Late work is not fair to your fellow students.  Therefore, late assignments will be penalized 1/3 of a letter grade per day, beginning at the time the assignment is due.  Thus an A paper turned in two hours late becomes an A-.  The next day it becomes a B+.  After three days a late paper receives no credit.

 

  1. Grading is inherently subjective.  I promise to think seriously about your grades.  Likewise, I expect you to take me at my word.  Please don’t ask me to change a grade unless you truly think I’ve made a mistake.

 

  1. As the name implies “class engagement” is the degree to which you evince not only understanding of course material, but also the extent to which you help the class move forward.  Throughout the semester you should be asking questions (not just to me, but to the class) and expressing opinions about the issues we are covering. 

 

  1. Finally: not everybody in this class will get an A.  Please keep in mind that a B is a very good grade; I reserve anything higher for truly outstanding work. 

 

 

Top

Textbook

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

 

The authors of this text have kindly made the text available online at:

 

            http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html

 

If using the online version, please download a complete copy of the book (once).  I will expect each week that you have read and brought to class (in printed or electronic form)  relevant portions of the text.

 

In the syllabus that follows, I refer to this book as IIR.

 

All other readings for this course are available electronically.  If possible, I have linked directly to them in the syllabus below.  Where this was not possible, you can find readings in one of several places.

 

Top

 

Detailed Outline

Date

Due Today

Topics and Readings

Jan. 26

 

Preliminaries: Modern Information Retrieval

 

Discuss access to iSchool computing resources

Feb. 2

IR innovation

IR innovation presentations

 

Getting our bearings

 

Reading:

1.    Belkin, N. (2008).  Some(what) grand challenges for information retrieval.  SIGIR Forum. [available online].

2.    Saracevic, T. (1997).  Users lost: reflections on the past, future, and limits of information science.  SIGIR Forum. 31(2).  [available online]

3.    Blog post (and the post it links to): How Google Measures Search Quality.  From The Noisy Channel (Daniel Tunkelang).

4.    Linux Documentation Project. Introduction to Linux. Chs. 1, 2, 3, 5, 6. [available online]

 

 

Lab: Unix overview / review

 

Feb. 9

 

Boolean IR  

 

Boolean retrieval slides

 

Reading:

  1. IIR Ch. 1
  2. Voorhees, E. (2007).  TREC: Continuing information retrieval’s tradition of experimentation.  Communications of the ACM. 51-54.  [e-journals].

 

 

 

Feb. 16

IIR 1.2, 1.3, 1.7, 1.8

Text Processing; Mathematical/Statistical Review

 

Math review slides

Text Processing Slides

 

 

Reading:

  1. Lial et al. Finite Mathematics. 2.3, 2.4, 9.1-9.3 (Blackboard).
  2. IIR Ch. 2.2-2.4, Ch. 6.

 

Lab: class datasets

 

Feb. 23

 Lial et al. 2.3 exs. 21, 28 

Lial et al. 2.4 exs 8, 15, 26, 31

 

Term Weighting; The vector space model of IR,

 

Term weighting slides

Vector space IR slides

 

Reading:

  1. IIR Chs. 6

 

 

Mar. 2

 

The vector space model of IR

 

Vector space IR slides

 

Reading:

  1. IIR Chs. 6

 

Lab: indexing with lemur

 

Mar. 9

TREC Presentation

 

IIR 6.9, 6.10, 6.15, 6.20, 6.21

TREC Presentations

·      Million Query  Eric

·      TREC-6 Cross-Language  Nicholas

·      TREC-5 Interactive

·      TREC-2001 Web

 

IR evaluation

 

IR Evaluation Slides

 

Reading:

  1. IIR Ch. 8 (skip p. 149)

 

Lab: lemur’s RetEval (parameterization, viewing results)

 

Mar. 16

 

 

Spring break - no class meeting

 

Mar. 23

IIR 8.1, 8.4, 8.8

Query expansion and relevance feedback

 

Class slides

 

Reading:

  1. IIR Ch. 9

 

Lab: lemur’s RetEval (parameterization, viewing results)

 

Mar. 30

TREC Presentation

 

IIR 9.1, 9.2, 9.3, 9.5

 

 

 

Final project test topics released

 

TREC Presentations

·      TREC-2001 Video  Julia

·      TREC-2002 Arabic/English CLIR

·      TREC-2004 HARD

·      TREC-2004 Terabyte Daniel

 

Probabilistic IR

 

Reading:

  1. [optional] Sparck Jones, K. and Walker, S. and Robertson, S. E.  (2000).  A probabilistic model of information retrieval: development and comparative experiments.  Information Processing and Management.  36(6). Pp. 779-808.  (e-journals)
  2. IIR. Ch. 11

 

 

Apr. 6

 

 

ECIR – NO CLASS MEETING

Meet in groups to finalize approach to final project.  The lab below should help with this; complete the lab with your final project partner(s)

 

Lab: Continued work with RetEval.  Relevance feedback and probabilistic retrieval.

 

Apr. 13

IIR 11.2

 

 

Language modeling approach to IR

 

Class slides

 

Reading:

  1. IIR. Ch. 12
  2. Zhai, C. and Lafferty, J. (2004).  A study of Smoothing methods for language models applied to information retrieval.  ACM Transactions on Information Systems.  22(2) pp. 179-214). [e-journals]

 

Lab: making and running queries with lemur (topic types, manual relevance feedback, custom stoplists)

Apr. 20

TREC Presentation

Final Project

 

IIR 12.8, 12.9

Final project test runs due

 

TREC Presentations

·      TREC-2007 Blog  Amy

·      TREC-2007 Enterprise Sarah

·      TREC-2007 HARD Archana

·      TREC-2007 Genomics Nusrat

 

Hyperlink analysis

 

Class slides

 

Reading:

  1. Kleinberg, J. (1998).  Authoritative sources in a hyperlinked environment.  Proc. 9 th ACM-SIAM Symposium on Discrete Algorithms. (Available online)
  2. IIR Ch. 21

 

Apr. 27

IIR 21.2, 21.3, 21.20, 21.22

 

Hyperlink analysis continued; course wrap-up

 

May 4

Final Project

In-class student presentations

 

 

 

 

Top

Administrativa

The University of Texas Honor Code

The core values of the University of Texas at Austin are learning, discovery, freedom, leadership, individual opportunity, and responsibility. Each member of the University is expected to uphold these values through integrity, honesty, trust, fairness, and respect toward peers and community.

Electronic mail Notification Policy

All students should become familiar with the University's official e-mail student notification policy. It is the student's responsibility to keep the University informed as to changes in his or her e-mail address. Students are expected to check e-mail on a frequent and regular basis in order to stay current with University-related communications, recognizing that certain communications may be time-critical. It is recommended that e-mail be checked daily, but at a minimum, twice per week. The complete text of the policy is available at http://www.utexas.edu/its/policies/emailnotify.html.

In this course e-mail will be used as a means of communication with students. You will be responsible for checking your e-mail regularly for class work and announcements. Note: if you are an employee of the University, your e-mail address in Blackboard is your employee address.

Contacting and Meeting with the Instructor

All students are encouraged to meet with the instructor regarding questions about the course, or simply to discuss course materials. While students are welcome to contact me by email, I will be much more receptive if you take the trouble to visit during office hours.

Students with disabilities

Any student with a documented disability (physical or cognitive) who requires academic accommodations should contact the Services for Students with Disabilities area of the Office of the Dean of Students at 471.6259 (voice) or 471.4641 (TTY for users who are deaf or hard of hearing) as soon as possible to request an official letter outlining authorized accommodations.

 

Top