Listed below are some of the resources I have run across while trying to educate myself on the basics of KDD/IR (especially concerning description/discovery of Web resources). Some of the tutorials might prove helpful when tackling the assigned class readings on KDD/IR (eons from now).
KDD Glossaries
Machine Learning Glossary of Terms
Special Issue on Applications of Machine Learning and the Knowledge Discovery Process
http://robotics.stanford.edu/~ronnyk/glossary.html
Machine Discovery Terminology
compiled by W. Kloesgen and J. Zytkow
http://orgwis.gmd.de/projects/explora/terms.html
Data Mining Glossary from Two Crows -
http://www.twocrows.com/glossary.htm
Datawarehouse Terminology
by Creative Data:
http://www.credata.com/research/terminology.html
Introductory material:
KDD -
Knowledge Discovery In Databases: Tools and Techniques
by Peggy Wright
http://www.acm.org/crossroads/xrds5-2/kdd.html?ROLES=0PSA0STA0EMA&DOMAIN=.acm.org
Data Mining -
Introduction to Data Mining and Knowledge Discovery. 3rd Ed. Published by Two Crows Corporation
http://www.twocrows.com/intro-dm.pdf
Web Resources IR -
Practical Issues for Automated Categorization of Web Sites
by John M. Pierre, Metacode Technologies, Inc.
September 2000
http://www.ics.forth.gr/isl/SemWeb/proceedings/session3-3/html_version/semanticweb.html
Info on DAML+OIL from daml.org:
Tutorials on DAML+OIL from xml.com:
http://www.xml.com/pub/a/2002/01/30/daml1.html
http://www.xml.com/pub/a/2002/03/13/daml.html
Basic basics on Ontology Inference Layer (OIL):
http://www.ontoknowledge.org/oil/
And, of course, more on Web Ontology Language (OWL):
http://www.w3.org/TR/2002/WD-owl-guide-20021104/#Abstract
Current work on Web Resource representation and IR:
For an overview of clickstream analysis of Web activity:
INFORMATIONWEEK.com News, March 12, 2001
Pan For Gold In The Clickstream
http://www.informationweek.com/828/prmining.htm
Using Topic Maps for Web Resources description and IR:
http://www.xml.com/pub/a/2002/09/11/topicmaps.html?page=1
Project Aristotle(sm): Automated Categorization of Web Resources, is a clearinghouse of projects, research, products and services that are investigating or which demonstrate the automated categorization, classification or organization of Web resources. A working bibliography of key and significant reports, papers and articles, is also provided. Projects and associated publications have been arranged by the name of the university, corporation, or other organization with which the principal investigator of a project is affiliated.
http://www.public.iastate.edu/~CYBERSTACKS/Aristotle.htm
An online textbook for those who REALLY want to get into the nitty gritty of Information Retrieval:
INFORMATION RETRIEVAL, 2nd Ed (1999). by C.J. van Rijsbergen
Department of Computing Science, University of Glasgow:
http://www.dcs.gla.ac.uk/~iain/keith/