References to the Early Years of Automatic Indexing and Information Retrieval

Organizing and Providing Access to Information -- LIS 391D.2 -- Spring 1998

Return to Table of Contents

horizontal rule

Doyle, L. B. (1975). Information retrieval and processing. Los Angeles: Melville.

Lauren B. Doyle, an early researcher in the field of information retrieval, prepared this volume for the firm of Becker and Hayes, drawing somewhat on an earlier title prepared by Joseph Becker and Robert M. Hayes in 1963, (Information Storage and Retrieval: Tools, Elements, Theories). Chapter ten, covering the processing of language data, provides a comprehensive overview of automatic indexing and language processing by means of semantic or conceptual schemes. Chapter eleven provides an overview of evaluation techniques developed at the time. References at the end of each chapter list documents and reports that can be considered foundational writings on these topics.

Keenan, S. (1973). Progress in automatic indexing and prognosis for the future. In J. A. Clifton, & D. Helgeson (Eds.), Computers in information data centers (pp. 97-104). Montvale, NJ: AFIPS Press.
This short paper reviews basic readings in automated indexing and attempts to predict trends for the future in this field. The author expresses a need for the fields of automated language processing and linguistics to collaborate for mutual benefit and study of the handling of natural language systems. At this point in time, automatic indexing techniques are being used with large size information stores and journal production is increasingly moving to machine-readable form.

Licklider, J. C. R. (1965). Libraries of the future. Cambridge, MA: M.I.T. Press.
This book is dedicated to Dr. Vannevar Bush for having written his pioneering article "As We May Think" in Atlantic Monthly, July, 1945, giving the author inspiration to prepare this work sponsored by the Council on Library Resources. At that time, the exploding size of the corpus of knowledge versus the capacity of computer memories and the speed of computer processors was a major concern. The author provides the term procognitive systems to name the system of the future that he predicts will bring many disciplines together, blending computer sciences, behavioral and social sciences, library sciences, and information storage and retrieval studies to form a system that benefits mankind. Topics discussed include random access memory, parallel processing, cathode ray oscilloscope displays, light pens, list structures, xerographic output units, and time-sharing computer systems with remote user stations.

Luhn, H. P. (1959). Auto-encoding of documents for information retrieval systems. In M. Boaz, Modern Trends in Documentation (pp. 45-58). London: Pergamon Press.
Luhn believed that the growing rate of information and document production necessitated the invention of methods allowing data to be retrieved from stores of documents without expensive human intervention. This paper discusses auto-encoding based on statistical procedures performed by a machine on the original text of a document already in machine-readable form. The prevalent machine-readable form of that time was primarily punched cards or paper tape and less frequently magnetic tape. The auto-encoding method used word frequency rates, a special thesaurus, and the development of multi-dimensional patterns based on word proximity. At the time, application of the method was limited to articles of 500 to 5000 words, but Luhn was confident that the logical capabilities of electronic machines, statistical methods, and "further research into the characteristics of human behavior as manifested in writing" would lead to better information dissemination and retrieval. Earlier articles by this author discuss the automatic creation of abstracts and the development of thesauri.

Luhn, H. P. (1961). Automated intelligence systems: Some basic problems and prerequisites for their solution. In E. Tomeski, R. W. Westcott, & M. Covington (Eds.), The clarification, unification & integration of information storage & retrieval proceedings of February 23rd 1961 symposium (pp. 3-20). New York: Management Dynamics.
Luhn discusses the swelling flood of new information and the need for a comprehensive intelligence system for selective dissemination of new information in this article. The system he outlined provides a profile reflecting the current interests of the user and is capable of storing and retrieving information related to those interests by comparing the profile of interests to the stored library of documents and retrieving those with matching similarity. Additional functions of the system were to locate others interested in similar topics and to match up any new incoming interest profiles with those already stored in order to facilitate communication among people in an organization. Interaction with the system was handled for the user through an intermediary, the information specialist. He proposes centralized services that could make machine-readable texts available promptly and predicts direct communication between electronic information processing machines and these text centers that would be entirely automatic. He also requests that work be done in the area of automatic recognition and characterization of pictorials and urges that the approach he advocated be implemented without delay.

Salton, G. (1968). Automated language processing. In C. A. Cuadra, (Ed.), Annual review of information science and technology (pp. 169-199). Chicago: Encyclopaedia Britannica.
This review article illustrates the state of the art in syntactic and semantic theories of language, emphasizing that the interpretation of the meanings of words in text is a formidable task. At this point, the ideal natural language processor traits were defined, but no actual model containing all traits existed. Examples of user involvement in iterative search query formulation were noted. Statistical processing for document retrieval was described as existing in a variety of semi-operational settings, however, Salton notes the need for evaluation of these methods. Work under way to foster evaluation includes the SMART project and the ASLIB Cranfield work. Salton comments that "It is this writer’s guess that the ideas of Luhn, far from being abortive, may really come into their own within the next few years. The simple language processing methods, including small stored dictionaries, suffix recognition procedures, and word token statistics, appear to be far more powerful than was originally thought possible; such methods will likely be used for most of the language processing tasks actually implemented on computers, including applications in library science and information retrieval." p. 191.

Schultz, C. K. (Ed.). (1968). H. P. Luhn: Pioneer of information science; selected works. New York: Spartan.
Hans Peter Luhn died in 1964 during his term as President of the American Documentation Institute, and in his honor a scholarship fund for Information Science was established in his name and this book was produced to convey biographical information, list the eighty patents held by H. P. Luhn, and compile in one place a considerable number of his writings and speeches. Mr. Luhn was an inventor, long term employee of IBM, and a person who was somewhat ahead of his time. He gained an interest in information retrieval in the 1950s and is credited with creating Keyword-in-Context (KWIC) indexing by machine method and coining the term selective dissemination of information. He believed in providing practical solutions to problems and produced two experimental demonstrations in connection with the 1958 International Conference on Scientific Information (ICSI) papers. The first was the application of his auto-abstracting technique to the papers for one session, and the second was the production of the KWIC index to all of the papers presented. These devices were introduced at this conference along with two new Luhn inventions, the 9900 Index Analyzer and the Universal Card Scanner. Following the conference, newspapers across the country carried stories about the auto-abstracting and auto-indexing system, which he described as the machine-generated equivalent of a completely intellectual task in the field of literature evaluation.

Stevens, M. E. (1970). Automatic indexing: A state-of-the-art report. National Bureau of Standards Monograph 91 reissued with additions and corrections SD Catalog No. C13.44:91). Washington, DC: U.S. Government Printing Office.
This formidable piece of work was initiated by the National Science Foundation and was jointly funded with the National Bureau of Standards. The survey was first conducted to be current through February of 1964. It was updated in February of 1970 with additions bringing it up to date with literature references through August 1969. The text of this volume represents a who’s who of thought and writing in the area of automatic indexing from the very earliest recorded instance of the concept through 1969. Each reference in the report is accompanied by a full citation. The report is comprehensive and thoroughly researched.

Taube, M., & Wooster, H. (Eds.). (1958). Information storage and retrieval: Theory, systems, and devices. New York: Columbia University Press.
This book represents the record of an Air Force symposium held forty years ago in Washington D. C. The list of participants in the panel discussions, impressive for the time, included representatives from Magnavox, Documentation Incorporated, Zator Company, Dow Chemical Company, International Business Machines, Eastman Kodak Company, and many others who were conducting advanced research and development in information storage and retrieval. H. P. Luhn presented a paper titled "Indexing, Language, and Meaning" in which he distinguished between systems for classifying collections that are "adopted" or borrowed from a pre-established categorical scheme, those that are "synthetic" or developed by subject matter experts for a given field, and those that are "native" or derived by statistical analyses from the collection itself. He felt that the native category was the most effective one for retrieval systems. At this same event, Calvin N. Mooers predicted that current barriers in language symbols (the control of meaning of languages) and machines that are simple enough and cheap enough to allow any individual to perform retrieval on his own collection would be crossed, giving rise to unimaginable advances.

Watters, C. (1992). Dictionary of information science and technology. San Diego: Academic Press.
This dictionary combines terms from several specialized subject areas and presents definitions that are quite understandable. Each definition is accompanied by a key which leads to a subject outline in the back of the dictionary. Each term is also annotated by one or more references to the literature which is the basic or seminal discussion of the term and also to a work that represents a direct usage of the term in the information science field.

Balnaves, J., Gerrie, B., & Oxley, S. (1980). A workbook in information retrieval (Fifth ed.). Canberra, Australia: Canberra College of Advanced Education.
This workbook for students of librarianship aims to provide practical exercises in retrieval of documents providing a familiarity with the vocabulary, systems, and popular database of the time. Included in the exercises is practice with key-word-in-context (KWIC) and key-word-out-of-context (KWOC). This work contains a glossary of terminology used in information retrieval.

horizontal rule

Return to Table of Contents

This page is created and maintained by Sue Soy ssoy@ischool.utexas.edu
Last Updated 11/11/98
© Copyright 1996 Susan K. Soy
Please feel free to copy and distribute freely for academic purposes with this notice and attribution.
All other rights reserved