Class Meeting Time and Place: Monday 9:00-12:00, SZB 464
Unique ID:
27440
Instructor: Miles
Efron
Office: SZB 562E
Office Hours: Tues. 10:00-11:00
Email: miles@ischool.utexas.edu
Web: http://www.ischool.utexas.edu/~miles
Course Description: An introductory survey of information filtering and
retrieval, with an emphasis on developing the student's understanding of the
relationship between the algorithms used by search engines, the query and
document, and system performance. This is an information science course,
not an information technology course. The course will emphasize basic
knowledge useful for those who will be in leadership positions in the
information professions.
Purpose:
The main purpose of this
course is to provide a foundation for understanding the models, assumptions,
and motivations that underpin contemporary information retrieval and filtering
systems:
In all of this work, we will
inform our discussion with important articles and readings from the fields of
Information and Library Science; another course goal is to ground students in
relevant literatures.
Prerequisites:
There are no official prerequisites
for this course. However, most of the reading and much of the graded work
requires facility with statistics, probability, and mathematics (i.e. basic
matrix operations). All required background will be reviewed during class, but
at a necessarily quick pace. Students with aversions to quantitative reasoning
may find this course frustrating.
Email will be the primary
method of communication outside of class.
Thus it is crucial that all enrolled students subscribe to our class
listserv. Instructions for joining the list are at
https://utlists.utexas.edu/sympa/
The name of the list is
inf384hs09.
|
Assignment |
Weight |
|
25% |
|
|
15% |
|
|
15% |
|
|
10% |
|
|
25% |
|
|
Class engagement |
10% |
Grading Details:
I will use the following
schedule in calculating final grades:
|
A+ = 100 |
A = 95-99 |
A- = 90-94 |
|
B+ = 85-89 |
B = 80-84 |
B- = 75-79 |
|
C+ = 70-74 |
C = 65-69 |
C- = 60-64 |
|
|
F = <60 |
|
A few notes on grading:
Christopher D. Manning,
Prabhakar Raghavan and Hinrich Schütze, Introduction to Information
Retrieval, Cambridge University Press. 2008.
The authors of this text
have kindly made the text available online at:
http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html
If using the online version,
please download a complete copy of the book (once). I will expect each week that you have read and brought to
class (in printed or electronic form)
relevant portions of the text.
In the syllabus that
follows, I refer to this book as IIR.
All other readings for this
course are available electronically.
If possible, I have linked directly to them in the syllabus below. Where this was not possible, you can
find readings in one of several places.
|
Date |
Due Today |
Topics and Readings |
|
Jan. 26 |
|
Preliminaries: Modern
Information Retrieval Discuss access to iSchool
computing resources |
|
Feb. 2 |
IR innovation
presentations Getting our bearings Reading: 1.
Belkin, N.
(2008). Some(what) grand
challenges for information retrieval.
SIGIR Forum. [available online]. 2.
Saracevic, T.
(1997). Users lost: reflections
on the past, future, and limits of information science. SIGIR
Forum. 31(2). [available online] 3.
Blog post (and
the post it links to): How
Google Measures Search Quality.
From The Noisy Channel (Daniel
Tunkelang). 4.
Linux
Documentation Project. Introduction to
Linux. Chs. 1, 2, 3, 5, 6. [available
online] |
|
|
Feb. 9 |
|
Boolean IR Reading:
|
|
Feb. 16 |
IIR 1.2,
1.3, 1.7, 1.8 |
Text Processing; Mathematical/Statistical
Review Reading:
Lab: class datasets |
|
Feb. 23 |
Lial et al. 2.3 exs. 21, 28 Lial et al. 2.4 exs 8, 15,
26, 31 |
Term Weighting; The vector
space model of IR, Reading:
|
|
Mar. 2 |
|
The vector space model of
IR Reading:
Lab: indexing with lemur |
|
Mar. 9 |
IIR 6.9,
6.10, 6.15, 6.20, 6.21 |
TREC
Presentations ·
Million
Query Eric ·
TREC-6
Cross-Language Nicholas ·
TREC-5
Interactive ·
TREC-2001
Web IR evaluation Reading:
Lab: lemur’s RetEval
(parameterization, viewing results) |
|
Mar. 16 |
|
Spring break - no class
meeting |
|
Mar. 23 |
IIR 8.1,
8.4, 8.8 |
Query expansion and
relevance feedback Class slides Reading:
Lab: lemur’s RetEval (parameterization, viewing results) |
|
Mar. 30 |
IIR 9.1,
9.2, 9.3, 9.5 |
Final
project test topics released TREC
Presentations ·
TREC-2001
Video Julia ·
TREC-2002
Arabic/English CLIR ·
TREC-2004
HARD ·
TREC-2004
Terabyte Daniel Probabilistic IR Reading:
|
|
Apr. 6 |
|
ECIR – NO CLASS MEETING Meet in groups to finalize
approach to final project. The
lab below should help with this; complete the lab with your final project
partner(s) Lab: Continued work with RetEval. Relevance feedback and probabilistic
retrieval. |
|
Apr. 13 |
IIR 11.2 |
Language modeling approach
to IR Class slides Reading:
Lab: making and running
queries with lemur (topic types, manual relevance feedback, custom stoplists) |
|
Apr. 20 |
IIR
12.8, 12.9 |
Final project test runs due TREC
Presentations ·
TREC-2007
Blog Amy ·
TREC-2007
Enterprise Sarah ·
TREC-2007
HARD Archana ·
TREC-2007
Genomics Nusrat Hyperlink analysis Class slides Reading:
|
|
Apr. 27 |
|
Hyperlink analysis
continued; course wrap-up |
|
May 4 |
In-class student
presentations |
The core values of the
University of Texas at Austin are learning, discovery, freedom, leadership,
individual opportunity, and responsibility. Each member of the University is
expected to uphold these values through integrity, honesty, trust, fairness,
and respect toward peers and community.
All students should become
familiar with the University's official e-mail student notification policy. It
is the student's responsibility to keep the University informed as to changes
in his or her e-mail address. Students are expected to check e-mail on a
frequent and regular basis in order to stay current with University-related
communications, recognizing that certain communications may be time-critical.
It is recommended that e-mail be checked daily, but at a minimum, twice per
week. The complete text of the policy is available at http://www.utexas.edu/its/policies/emailnotify.html.
In this course e-mail will be used as a
means of communication with students. You will be responsible for checking your
e-mail regularly for class work and announcements. Note: if you are an employee
of the University, your e-mail address in Blackboard is your employee address.
All students are encouraged
to meet with the instructor regarding questions about the course, or simply to
discuss course materials. While students are welcome to contact me by email, I
will be much more receptive if you take the trouble to visit during office
hours.
Any student with a
documented disability (physical or cognitive) who requires academic
accommodations should contact the Services for Students with Disabilities area
of the Office of the Dean of Students at 471.6259 (voice) or 471.4641 (TTY for
users who are deaf or hard of hearing) as soon as possible to request an
official letter outlining authorized accommodations.