Introduction
This lesson discusses
various meanings of the word "access" in the field of library and
information science (LIS) and various aspects of the provision of
access to information-bearing entities (InBEs).
Access and the
LIS Profession
First of all,
let us acknowledge that "access" is what the LIS professions are all
about. The essence of our jobs is to assist users in gaining access
to information that they are seeking or may someday seek. We organize
informationin libraries, information centers, records centers, archives,
etc.in order to facilitate future access by users. It is an old
truism that even the best library in the world would be reduced to
near uselessness if its contents were simply piled into a heap of
books and other materials.
As librarians,
archivists, records managers, information specialists, etc., we try
to anticipate the ways in which the users of our collections of InBEs
are likely, in our best judgment, to seek information. But if we are
doing our jobs properly, we will also recognize that in the future
there will be users with inquiries that we failed to anticipate, or
could not (because of unforeseeable changes in the world) have anticipated;
and we will do our best to provide tools with which those future users
will be able to find the information they will be seeking.
Thus, as LIS
professionals we have two goals: (1) to do a good job of organizing
InBEs, which presupposes the effective execution of prior tasks such
as ascertaining our users' needs, extrapolating to their foreseeable
future needs, selecting materials likely to be responsive to those
needs, and acquiring those materials; and (2) developing a variety
of tools that are flexible enough to be useful both now and in the
future and that are usable by our users, both with and without our
direct assistance.
These seemingly
straightforward goals lead to many complications.
Known-Item vs.
Unknown-Item Searches
We can begin
by considering the ways in which users seek access to InBEs in a "traditional"
library. One way of categorizing typical searches is to divide them
into "known item" vs. "unknown item" searches.
Typical known-item
searches involve a user's knowing the InBE's author and title (and
edition, if pertinent), or its International Standard Book Number
(ISBN), or its Library of Congress card number, or its call number
in the particular library. For known-item searches, access should
be simple and direct; i.e., the retrieval tools of the library should
enable the user to determine quickly (1) the shelf location (the place
within the library where the known item is expected to reside) and
(2) its charge status (i.e., whether the item is currently on hand
or is charged out, at the bindery, on reserve, etc.). The library's
retrieval tools (e.g., its online public-access catalog [OPAC] or
card catalog) should facilitate the user's ascertaining the item's
shelf location and charge status.
Note, however,
that no card catalog provides charge-status information or access
via ISBN or LC card numbers, and that many OPACs (especially older
ones) fail to provide charge-status information or allow ISBN or LC-card-number
searches. Note also that even after the shelf location and charge
status have been determined, the user may still need to use other
tools, examples of which are the directories and maps in the Perry-Castañeda
Library that tell the user which floors house which sequences of call
numbers and which areas of bookshelves on the pertinent floor house
the call number the user is seeking. In short, even known-item searches
may still involve several other steps before the user can finally
gain access to the target InBE.
Typical unknown-item
searches involve a user's seeking InBEs that deal with a certain subject
or fill a certain general need such as, "I'd like to find a good detective
story to read this weekend." Such searches tend to be lengthier and
more difficult than known-item searches, but they are a substantial
part of the usage of many libraries. OPACs and card catalogs can facilitate
such searches to greater or lesser extents depending on the nature
of the search, but often the OPAC or card catalog can do little more
than help the user get to a bookshelf where the user can start browsing
for an item that satisfies his or her search.
Access via Remote
Browsing Requires Retrospective Conversion of InBEs into Computerized
Form
In other words,
in unknown-item searches the library's retrieval tools tend to be
used simply to help the user get into close physical proximity to
(i.e., enable the user to browse in) a set of InBEs that are potentially
responsive to his or her interest of the moment. This raises the question
of what kinds of tools could be provided to help users browse remotely
among sets of potentially responsive InBEs. One can dream of someday
being able to search, from one's home or office anywhere on earth,
through the full-texts and images of InBEs in collections located
anywhere else on earth. That time may come surprisingly soon for certain
types of InBEs that will be being produced at that future time and
for which retrieval tags can be prepared and supplied with the InBEs;
but it surely will be many decades before any substantial proportion
of the InBEs produced in the past (e.g., books and journals printed
before 1980) will be available for remote browsing.
If current and
prospective future retrieval tools are to be used, InBEs produced
in the past must be processed into machine-readable form. For the
purpose of enabling general searches, this involves more than simply
digitizing page images with a scanner. However, a limited browsing
capability could be provided by capturing page images of books, for
example, provided that retrieval tags (e.g., the cataloging data for
the books) could be put into machine-readable form and linked to the
page images in a computer store. Till quite recently, the cost of
storing large numbers of digitized page images has been too great
to encourage people to think seriously about this way of providing
remote browsing, but storage costs have been sharply reduced in the
past decade (see Endnote 1), and it is probably time to reconsider
providing this kind of browsability.
As far permitting
anything in the way of searching text for specific words, or sets
of words, is concerned, a digitized image of a page of text is useless
till the images of individual characters on the page have been recognized
and translated into ASCII (or Unicode) bit strings. To do this requires
some form of optical character-recognition (OCR) processing. Although
in the past decade, owing mainly to increases in computer power, OCR
has become much cheaper and faster than in the past, it still fails
to achieve sufficiently high accuracy to handle many kinds of material
satisfactorily. (For example, OCR programs can do a pretty good job
of recognizing words because this task can use word-frequency and
context clues to help pin down the interpretation of a doubtful letter,
but if an OCR program is trying to recognize tables of numerical data,
no such clues are available.)
The costs of
preparing past InBEs remain high enough that it is doubtful that conversion
of past InBEs into ASCII strings on a grand scale will take place
soonif ever. Yes, the yearly expenditures on armaments (or even
tobacco products) could pay for large amounts of retrospective conversion,
but the diversion of such expenditures into "OCRing" the
printed-paper contents of large libraries is politically inconceivable.
For the foreseeable future, I suspect that retrospective conversion
will continue to be what it has been during the past decade or two,
viz., a matter of scattered efforts, often by volunteers such as those
of Project Gutenberg or by
individual hobbyists, with occasional funding of larger efforts devoted
to specific needs or special collections of interest to wealthy individuals
or foundations. And I remain convinced that many past InBEs will never
be judged worthy of being put into machine-readable form: e.g., privately
printed ("vanity press") books of tediously long sermons by long forgotten,
uninspiring small-town preachers in the 1720s.
Subject-Oriented
Unknown-Item Searches
Let us examine
more closely the matter of searching for InBEs relevant to a particular
subject. If a user is searching text InBEs, the searches will require
some kind of matching between:
(1) words and phrases by which the user expresses his or
her subject interest,
and either
(2a) words,
phrases, subject headings, classification (e.g., Library of Congress
or Dewey) numbers, or other retrieval tags associated with the InBEs,
or
(2b) words
and phrases in the full texts of the InBEs, if the full texts are
in machine-readable form.
Unfortunately,
there are numerous possibilities for mismatches. To express their
subject interests, users will often employ different words from those
used by catalogers or indexers to deal with the same subjects. Unless
there is some way of connecting the users' chosen words with the catalogers'
chosen words, the desired match will fail. A good deal of effort (e.g.,
in the SMART Project of the late Gerard
Salton) has gone into research on the development of sets of synonyms,
near synonyms, and related terms, and ways to build such thesauruses
by programmatic means, but they are still expensive and difficult
to implement. Although some search engines on the World-Wide Web employ
similar techniques, access via thesaurus-aided searching remains rather
rare. An intrinsic difficulty with efforts to provide thesaurus-aided
searching is that vocabulary changes constantly occur.
Strategies for
Full-Text Searches
A discussion
of access to information would be incomplete without mention of some
of the kinds of search strategies that are available for carrying
out full-text searches. The simplest such strategy is for a user to
choose a particular word or phrase that he or she considers related
to the subject of interest, and simply to select all the InBEs in
a collection whose texts contain the target word or phrase. This is
Boolean logic at its most elementary level. A step up in sophistication
is to choose two or more target words or phrases, and to combine them
in Boolean fashion with "OR" or "AND" connectors. For example, a user
might choose terms A and B and have the search process select only
those InBEs in a collection whose texts contain both A and B.
Further steps
up in sophistication include "proximity" searches. In one type of
proximity search, a user can choose two or more target words or phrases
and have the search process select only those InBEs in a collection
whose texts contain A and B within a single sentence, or within a
single paragraph, or within a single section. In another type of proximity
search, the user can specify that she or he wants to select just those
InBEs in which A and B occur within N words of each other, where N
is a number specified by the user. The user might even restrict the
search to just those InBEs in which B precedes A by no more than N
words, rejecting InBEs in which B follows A by M words or fewer.
Such Boolean
and proximity searches can also employ "wildcards" and truncation.
An example of a wildcard would be a search for InBEs relevant to geese;
a searcher could specify that he or she wanted to find InBEs containing
character strings like "g??se", where the "?" is a wildcard used to
represent any single character. Clearly, a search for "g??se" would
find matches with both "goose" and "geese." Unfortunately, such a
search would also yield matches with "gorse" and "guise," so a searcher
might decide to refine his or her search to something like this: Find
InBEs containing "g??se?" AND NOT containing "gorse" AND NOT containing
"guise." An example of truncation would be a search for InBEs relevant
to children; a searcher could specify that she or he wanted to find
InBEs containing character strings like "child*", where the "*" is
used to represent one or more characters of any kind. Such a search
would yield matches with "child", "child's", "children", "children's",
"childish", "child-like", and so on.
Access to Items
in Records-Management Systems and Archives
Paper Records
In
archives and records-management systems, access to materials is generally
provided through non-subject-related means. Indeed, typical records-management
systems provide access not to individual InBEs but to sets of InBEs
grouped in ways that facilitate the handling of categories of records.
For example, correspondence may be stored in file folders by months,
monthly folders may be grouped by years, and yearly folders grouped
by department within the company or institution. A typical end result
might be a box with a label like "Correspondence, Acquisitions Department,
1999."
In contrast to
the typical library's goal of facilitating quick access to an individual
InBE such as a book, the goal in a records-management system is not
to provide ready access to a individual letter. Instead, records-management
systems are designed to work with groups of records, with each group
being handled as the basic unit for the purposes of storage and retrieval.
For example,
the presumption in a records-management system is that once a letter
is old enough, it becomes unlikely to be sought and therefore should
be removed from the file cabinet or cabinets that house the current
working files. File space is always limited, and little used materials
should not occupy space, in file cabinets or computer memories, that
needs to be reserved for materials that are in current use. Once removed
from the current working files, the "old" letters are simply stored
in a way that will make it possible to retrieve one or more of them
in the unlikely, and infrequent, event that they become needed; such
letters (and similarly excised materials) are typically housed in
boxes in warehouses from which they can be retrieved on a few days'
notice. Of course, it must be possible to identify the box in which
a desired letter is stored, but that can usually be done with no more
detailed indexing than is provided by such statements as "Box 123
contains the 1999 correspondence of the Acquisitions Department."
The consequence
is that if a particular letter is desired, someone may have to spend
a couple of hours shuffling through files and individual pieces of
paper from the pertinent box to find the letter. On the other hand,
in contrast with the practice in a typical library, there will have
been very little intellectual effort (and, hence, little time and
expense) exerted to get the letter into the box, quite unlike the
extensive and costly effort involved in adding a book to a library's
collection and getting the book into its appropriate place on the
library's shelves. A old rule-of-thumb is that it costs about as much
to process a book into a library's collection as it costs to buy the
book. And it should be noted also that any large library will contain
a substantial percentage of volumes that have never been used, despite
the extensive and costly efforts that led to their being in the library.
An interesting
aspect of records-management systems is that they often make use of
color-coded labels on file folders, or computer-tape containers, and
the like, to make it easy to keep the folders and containers in proper
order.
Electronic
Records
An aspect of the computer revolution that tends to be badly under-appreciated
by the general public, and even by many librarians and information
scientists, is that many corporate and governmental records that were
once kept on paper now exist only in electronic form. The replacement
of paper records by electronic records brings many advantages, such
as ease of access by those with the pertinent software and hardware
tools. Unfortunately, this replacement also brings a host of new problems,
especially those stemming from the ease with which digital records
can be changed and/or destroyed, deliberately or inadvertently, as
well as from the problems of the inherent impermanence of digital
records, i.e., from the fact that magnetic recordings decay with time
and that the materials from which CDs and DVDs are made are probably
much less stable than (non-acidic) paper. A recent report from the
State Archives Department of the Minnesota Historical Society, Electronic
Records Management Guidelines, discusses these problems in a concise
fashion and provides links to additional sources of information about
them.
Specialized
Kinds of Access
Besides the kinds
of access tools that we tend to think of in the context of libraries
and other information agencies, access to information can be provided
in other interesting specialized ways. A familiar example of access
is that provided by telephone directories. This access is not limited
to the ordinary white-pages and yellow-pages directories, with which
everyone is familiar, but also includes the "criss-cross" directories
that are available to police and other emergency agencies. Criss-cross
directories are arranged by street address, i.e., alphabetically by
street name and within a given street name, numerically by the individual
houses or other locations.
Another kind
of access to information is that provided by dictionaries. Everyone
is familiar with the basic alphabetically arranged dictionary, but
there are other kinds of word-access tools also. One such word-access
tool is exemplified by Roget's
Thesaurus, which in both its printed and online forms provides
access to sets of near synonyms of a given word and, in many cases,
to sets of antonyms to a given word. Perusing the sets of near synonyms
or antonyms enables one to be reminded of the word that will convey
the precise meaning that one desires at the moment; or at least such
perusal enables one to find words that she or he can look up in a
dictionary in order to ascertain the various shades of meanings conveyed.
A less familiar
example of a similar kind of aid to word access is the pictorial dictionary
popularized by the Duden publishing firm in Germany and, hence, often
known as a "Duden." These pictorial dictionaries contain pictures
of everyday things with each item labeled. For example, suppose you
could not think of the name of a particular hand tool; using a pictorial
dictionary, you could look through a page or two of pictures of hand
tools and could expect to find the tool you are seeking, together
with its name.
Version Control
and Access
Version control
is an aspect of access that is important in group work. When two or
more people work together to write a report, it becomes important
to keep track of the contributions and the changes (additions, modifications,
and deletions) made to the report by each person, in order to minimize
repetition and to avoid overlooking topics (e.g., "because someone
else was supposed to take care of that"). The kind of software known
as "groupware," which facilitates the preparation of documents and
the carrying out of projects by groups of people, is designed to make
it easy to keep track of different versions of a document, i.e., to
exercise "version control." Probably the best known example of groupware
is Lotus Notes, but in the past few years Microsoft Word has included
a modest capability for version control.
Another kind
of version control relates to an increasingly severe problem in the
long-term storage of many materials. This is the problem of changes
in the version of the hardware or the computer software that produced
the materials. For example, it can be difficult these days to find
a player for an 8-track-tape, though such tapes and their players
were quite common 30 years ago. Similarly, it is becoming difficult
to find turntables for playing 33-rpm records, and even harder to
find turntables and needles, etc., for playing 78-rpm and 45-rpm records.
The history of changes in physical formats suggests that a couple
of decades from now it may be difficult for you to play your 1980s
and 1990s CDs because some newer physical format for storage will
have superseded both CDs and DVDs.
A more seriouscertainly,
a more expensiveexample of the hardware-version problem is the
following: During the first decade of space exploration, lengthy records
were kept of the telemetry signals received from the satellites and
interplanetary vehicles launched by the U.S. and the Soviet Union.
The U.S. signals were recorded on computer tapes that were written
in 7-track tape format (i.e., 6 physical levels or tracks of data
plus 1 track of error-check coding), which sufficed for the primarily
6-bit bytes that computer systems used till the early 1970s. Tens
of thousands of such tape reels were used to store the signals from
space. By the early 1980s, the computer industry had converted itself
almost totally to the use of 8-bit bytes, which require 9-track tape
units, and 7-track tape drives started disappearing. My understanding
is that the last three working 7-track tape drives in the world are
in NASA headquarters in Houston, and that though efforts are being
made to copy the contents of old 7-track tapes to new 9-track tapes
and to other modern storage, the three working 7-track tape drives
cannot process the old tapes anywhere near fast enough to keep up
with the rate at which the old 7-track tapes are physically deteriorating
from natural aging. The result is that most of the original raw data
from 1960s space exploration has already been lost forever.
We have just
discussed an example of how changes in the versions of hardware can
affect access to information. Similar problems arise on the software
side. Documents produced by early versions of Microsoft Word cannot
be read by the current version of MS Word, even though this software
is the direct descendant of the MS Word of the mid-1980s. One cannot
help wondering whether Microsoft has dropped compatibility with the
early versions of Word documents in order to increase sales of newer
versions of its software, but whether that conjecture is correct or
not, it is a fact that many people have lost the ability to use the
machine-readable files of their mid-1980s Word documents. This fact
poses clear dangers for records managers and archivists.
Varieties of
Control of the Level of Access
Many systems
that provide access to information permit only selected persons to
exercise access. For example, public libraries typically permit only
registered users to borrow materials, and often limit eligibility
for registration to only residents of the government (e.g., city or
county) that provides the financial support of the library. Academic
libraries typically restrict borrowing to students and staff members
of the institution, and they may, further, provide for different categories
of borrowers to have different levels of access to materials and periods
of borrowing them. Both public and academic libraries often disallow
access (e.g., suspend borrowing privileges) in the case of users who
have failed to return borrowed items and/or pay fines.
Many institutions
permit access to their facilities, computer systems, information stores,
etc., to only a selected set of persons. Financial institutions permit
their customers to have access by check, by telephone, and/or via
the Web only to each customer's own account. The Department of Defense
and certain other U.S. government agencies grant security clearances
to some of their employees, usually only to those who successfully
pass an investigation that includes checks of police records and similar
stores of personal information. The U.S. system of security clearances
provides 3 basic levels of access: Confidential, Secret, and Top Secret,
plus possible further restrictions concerning certain subject areas
and/or sources of information. Many companies have similar policies
of restricting access to just selected sets of their employees. For
example, access may be restricted to only those employees working
in a certain division or on certain projects, or to only employees
holding certain types of jobs (e.g., typically, only human-resources
and accounting specialists have broad access to the salaries of other
employees). Universities may restrict access to certain areas of their
computer systems to only those students enrolled in a certain class;
e.g., postings to the LIS 386.13 Discussion Board are restricted to
students in the class via passwords. In practice, most such security-
and privacy-type restrictions are enforced through passwords.
Still another
kind of control of access is that exercised, for example, by parents
and libraries who use software that prevents
children from accessing Websites that the parents or libraries consider
offensive. One type of such software permits the child to seek only
sites that are on an established list of acceptable sites; another
type checks for the presence of certain words in the names of hyperlinks
and files available through a Website and prevents the child from
accessing any site where the software finds those words. (You are
probably already aware that considerable controversy has been, and
continues to be, associated with the use of such software.)
Searching for
Images Relevant to a Subject
So far we have
been talking about searching textual InBEs. A whole other world of
difficulties occurs when a user wants to search for images relevant
to a particular interest or subject. In the text world, we have available
sources of word-based retrieval tags such as