This lesson discusses "metadata." As a word, "metadata" is a combination. One component is "data," the plural of "datum." The Merriam-Webster Collegiate Dictionary Online (MWCDO) provides three meanings for "data":
1 : factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation. . . .
2 : information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful
3 : information in numerical form that can be digitally transmitted or processed
The other component is "meta", which is used in the meaning that the MWCDO describes as follows:
3 [metaphysics] : more comprehensive : transcending <metapsychology> used with the name of a discipline to designate a new but related discipline designed to deal critically with the original one <metamathematics>
Thus "metadata" means "data that deal with other data," or "data that deal with original data," or casually but briefly, "data about data."
Within the library- and information-science (LIS) community, the most frequent use of "metadata" is to refer to data produced as part of the process of cataloging of materials in libraries and other information agencies. Cataloging data are, by their very nature, data about other things, such as books and other information-bearing entities (InBEs).
A less frequent but still important use of "metadata" in LIS is to refer to those parts of the structure of a relational database that describe the contents of the various tables (files) and columns (fields) that make up the database. For example, a database designer might describe a certain column in a certain table in a database as: "This column specifies the employee's Social Security Number; it contains 9 bytes; the bytes must be numeric; and any row in the table that lacks data in this column is not a valid row in the table." These statements are metadata concerning the data that are stored in the database by using that table and that column.
Other uses of metadata in our field clearly exist, for, in a general sense, any statement that one makes about the nature of an item or items in a collection of InBEs can be viewed as a metadata statement. For example, the Website of UT-Austin's Nettie Lee Benson Collection begins with the following description of the collection:
The Nettie Lee Benson Latin American Collection, a unit of the General Libraries at the University of Texas at Austin, is a specialized research library focusing on materials from and about Latin America, and on materials relating to Spanish-speaking peoples in the United States. Latin America is here defined to include Mexico, Central America, the Caribbean island nations, South America, and areas of the United States during the period they were a part of the Spanish Empire or Mexico. Named in honor of its former director (1942-1975), the Nettie Lee Benson Collection contains over 800,000 books, periodicals, and pamphlets, 2,500 linear feet of manuscripts, 19,000 maps, 21,000 microforms, 11,500 broadsides, 93,500 photographs, and 38,000 items in a variety of other media (sound recordings, drawings, video tapes and cassettes, slides, transparencies, posters, memorabilia, and electronic media).
The foregoing description can be construed as a metadata statement.
In this lesson, we concentrate on the use of "metadata" to refer to cataloging data.
From the beginnings of their history, libraries (and other information agencies, such as archives) have provided various kinds of descriptions of, i.e., metadata about, the materials included in their collections. A description might be as brief as: "This room stores scrolls dealing with the plans of Pharaoh Rameses for his pyramid" or "Box containing our treaties with Sparta." Or a description might be as lengthy as a printed list of the works owned by a library, with each work described in terms of its author(s), title, and various other data about the work that the compilers of the list considered important.
In modern times, a widespread way of providing descriptions of materials in libraries has been the printed catalog card and its computerized successors, especially the MARC record. During the 20th century, a great deal of attention was given by librarians to the question of just what kinds of metadata should be employed in standard practice, i.e., to the question of how to set standards for the kinds of data that should be recordedin the form of a catalog card or its equivalentabout the various materials that libraries and other information agencies collect. In Anglophone countries, an important embodiment of standardized metadata practices has been the various versions, beginning in 1908, of the Anglo-American Cataloging Rules (AACR), developed cooperatively by the American Library Association, the Canadian Library Association, and the Library Association (which is the association of libraries and librarians in the United Kingdom; the association's founders felt that any educated person would be able to supply the missing adjective, "British").
The principal elements used in providing metadata descriptions of typical library materials, such as books and other InBEs, include (see Endnote 1):
Recent years have seen a widespread development of online public-access catalogs (OPACs) and, in particular, an explosion of information resources available via the World-Wide Web. These developments sparked an effort to define a minimally sufficient seta "core"of cataloging data, i.e., metadata, that would be useful as a standard for OPACs and, especially, for catalogs, guides, search engines, etc., aimed at providing access to "Document-Like Objects" (DLOs) available via the Web. This effort has become known as the Dublin Core Initiative because it began with a workshop held in March 1995 in Dublin, Ohio. The workshop was sponsored by OCLC, Inc., and the National Center for Supercomputing Applications (NCSA).
In December 1996, the Dublin Core Initiative defined a set of 15 metadata elements to be used as the minimally sufficient set, or core. This set has become known as the "Dublin Core." Here are the elements of the Dublin Core, as condensed from Dublin Core Metadata Element Set, Version 1.1: Reference Description:
| ELEMENT |
DEFINITION |
COMMENT |
|
| Title | A name given to the resource | Typically, a Title will be a name by which the resource is formally known. |
|
| Creator | An entity primarily responsible for making the content of the resource. |
Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity. |
|
| Subject | The topic of the content of the resource. |
Typically, a Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. |
|
| Description | An account of the content of the resource. |
Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content. |
|
| Publisher | An entity responsible for making the resource available |
Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity. |
|
| Contributor | An entity responsible for making contributions to the content of the resource. |
Examples of a Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity. |
|
| Date | A date associated with an event in the life cycle of the resource. |
Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format. |
|
| Type | The nature or genre of the content of the resource. |
Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the working draft list of Dublin Core types [DCT1]). To describe the physical or digital manifestation of the resource, use the FORMAT element. |
|
| Format | The physical or digital manifestation of the resource. |
Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats). |
|
| Identifier | An unambiguous reference to the resource within a given context. |
Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN). |
|
| Source | A reference to a resource from which the present resource is derived. |
The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system. |
|
| Language | A language of the intellectual content of the resource. |
Recommended best practice for the values of the Language element is defined by RFC 1766 [RFC1766] which includes a two-letter Language Code (taken from the ISO 639 standard [ISO639]), followed optionally, by a two-letter Country Code (taken from the ISO 3166 standard [ISO3166]). For example, 'en' for English, 'fr' for French, or 'en-uk' for English used in the United Kingdom. |
|
| Relation | A reference to a related resource. |
Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system. |
|
| Coverage | The extent or scope of the content of the resource. | Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names [TGN]) and that, where appropriate, named places or time periods be used in preference to numeric identifiers such as sets of coordinates or date ranges. | |
| Rights | Information about rights held in and over the resource. |
Typically, a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource. |
It is worth remarking that one way of identifying various metadata elements, such as those of the Dublin Core, in a file would be to use the kind of tagging of portions of text that is provided by XML, eXtensible Markup Language (an introduction to which can be found at A Technical Introduction to XML).
This lesson has provided an introduction to the idea of "metadata" and to how this term is used in LIS, with respect both to library-cataloging practice and to the provision of access to information on the World-Wide Web.
For a deeper look at metadata and its many aspects, I strongly recommend that you read at least one of the following discussions:
Setting the Stage by Dr. Anne J. Gilliland-Swetland, a book chapter that has been made available on the Web through the courtesy of the Getty Information Institute, a part of the J. Paul Getty Trust (see Endnote 2).
An extended development of the idea of metadata and its role as a major tool of information science, well worth reading, is:
Bates, Marcia J. (1999). The Invisible Substrate in Information Science. Journal of the American Society for Information Science, 50(12): 1043-1050. Retrieved October 8, 2002, from http://www.gseis.ucla.edu/faculty/bates/substrate.html
1. You will learn more about library-cataloging principles and practices when you take such iSchool courses as INF 384C, Organizing and Providing Access to Information, INF 384E, Descriptive Cataloging and Metadata, and INF 384F, Subject Cataloging and Indexing.
2. "Setting the Stage" is part of an excellent short book:
Baca, Murtha, ed. Introduction to Metadata: Pathways to Digital Information.
Los Angeles, CA: Getty Information Institute; 1998. ISBN:0-89236-533-1.
Last revised 2004 Feb 11