U.S. Government Information Locator Service (GILS), the Dublin Core Metadata Element Set, the Warwick Framework, and Resource Description Framework

Organizing and Providing Access to Information -- LIS 391D.2 -- Spring 1998

Return to Table of Contents

horizontal rule

Introduction

Inventors, mathematicians, computer scientists, documentalists, information scientists, and librarians, and, in fact, everyone seeks the most effective and efficient path to information. This paper discusses several metadata schemes and the possibilities related to their application in locating and retrieving information. The schemes discussed include the U.S. Government Information Locator Service (GILS), the Dublin Core Metadata Element Set, the Warwick Framework, and the Resource Description Framework. In examining these schemes and services, we must recognize that people are the key to a successful implementation. Metadata schemes are promising, search techniques are improving, and computer power is increasingly available to more people, but the essential element for success is independent of the machine. Success rests with people and their appropriate application of machine aided techniques to meet user requirements.

Historical Perspective

The first instance of indexing using words from titles of documents is attributed to a British librarian, Andreas Crestadoro, who advocated the permutation of the words in titles in 1856 so that the subject matter index would follow the author’s own definition of the contents of the book. His paper on this topic is titled The Art of Making Catalogues of Libraries; A Method to Obtain in a Short Time a Most Perfect, Complete, and Satisfactory Catalogue of the British Museum Library, By a Reader Therein (Stevens, 1970, p. 4).

In the 1950s, Hans Peter Luhn created a revolution in the history of information retrieval by applying the machine to Crestadoro’s method of indexing. Luhn called his invention the Keyword-in-Context (KWIC) index. He advocated permuted title techniques and demonstrated the practical application of KWIC methods by preparing an index to papers presented at the International Conference on Scientific Information (ICSI) in November 1958 in Washington D. C. Luhn used two IBM machines, the 9900 Index Analyzer and the Universal Card Scanner, to prepare the indexes. These methods inspired and influenced others to experiment with the KWIC technique and by 1964, over forty examples of KWIC and other variations of permuted keyword indexing were in operation to aid researchers (Stevens, 1970 p. 8). The American Chemical Society adopted the KWIC index to produce their large scale publication, "Chemical Titles," twenty-four times a year with 3,000 articles per issue. At the ICSI conference, Luhn and his fellow IBM researchers also demonstrated the automatic creation and printout of abstracts using electronic data processing equipment without any human intervention except for the handling of the input and output records. By 1963, Luhn, then President of the American Documentation Institute (ADI), engineered the production of a volume of technical papers for the ADI conference by using automatic typesetting techniques. He delivered the product within three weeks of receipt of the manuscripts. The volume contained a computer produced table of contents for the 600 short papers, a KWIC index, an author index, a citation index to the bibliographic references in the papers, a KWIC index to the titles of these cited references, a bibliography of the cited papers, and an author index to the citations.

The idea of producing abstracts and indexes without human intervention seemed like an ideal way to reduce the cost of labor and add speed, accuracy, and consistency to indexing and abstracting processes. These were "breakthrough" ideas at the time with real and practical applications. People were eager to have information and access to the great amounts of scientific data and the results of work being performed in the scientific arena. Scientists shared papers and research reports primarily by trading papers, attending professional conferences, and reading professional literature. Machine readable text was not readily available and it was expensive to produce. At the same time that Luhn began to explore ways of reducing manual labor by using a computer to perform indexing procedures, people considered the idea questionable and even controversial. In 1965, Mary Elizabeth Stevens raised important questions in her report on automatic indexing titled "Automatic Indexing: A State-of-the-Art Report" (Stevens, 1970).

Stevens thought that learning more about the indexing process itself, through experimentation with machines, would provide results of general interest, not just of interest to those optimistic about machine indexing experiments (Stevens, 1970 p. 10).

Since the 1950s, many evaluators and researchers have improved retrieval techniques leading to word-based indexing of text, indexing based on parts of speech, tagging and phrase identification, and indexing by domain and dependent features such as names, dates, or locations. Researchers have introduced and refined techniques such as query processes that integrate natural language, Boolean search techniques, and proximity searching. Research into users’ needs and human factors has also evolved at the same time. System designers have promoted the use of interfaces that provide relevance-ranking, best passage highlighting, and ways to determine the original document versus subsequent versions. Incremental evolutionary techniques for information retrieval discovered over the past forty years are still in use today. At the same time, the availability of computers has improved dramatically.

Today we are creating a new revolution by using what is termed "metadata" to index and locate documents. Metadata is defined as data about data, comparable to the information found in the library card catalog to describe a book or other forms of recorded information: the information on the card is metadata, data about data. Identifying metadata, tagging it using the markup languages available today, and applying it as an authoring tool in the routine production of documents, combined with advances in the standardization of networking protocols, search engine technology, and user interface design represent another revolutionary step in improved information retrieval.

U.S. Government Information Locator Service (GILS)

GILS is a decentralized collection of agency-based information locators using network technology and international standards to direct users to relevant and publicly accessible information resources within the Federal government. The purpose of GILS, a locator service, is to identify public information resources throughout the Federal government, describe the information available in those resources, provide assistance in obtaining the information, and serve as a tool to improve agency electronic records management practices. First conceived in the Office of Management and Budget, GILS is described in the Office of Management and Budget Bulletin 95-01 released December 7, 1994. The Bulletin directs agencies to describe three categories of information resources: agency information products, agency information systems, and Privacy Act systems. The architecture of GILS includes three components: a set of agency based network accessible information servers, a standard search and retrieval protocol, and a collection of records which use GILS metadata core elements to describe those records. The National Institute of Standards and Technology has issued further guidance in the form of Federal Information Processing Standard (FIPS) 192-1. This publication provides technical specifications and implementation guidelines for agencies to follow. Another publication, the Application Profile for GILS is now in Version 2 and is maintained by the Open Systems Environment Implementors Workshop/Special Interest Group on GILS. This profile document provides definitions, guidance in display formats for presentation to users, the GILS Core Elements list, and mapping to USMARC (Open Systems Environment Implementors Workshop/Special Interest Group on GILS, 1997).

At the time policymakers were developing GILS, the Web was in its early stages of implemention. GILS is in its own early stages of adoption and implementation and depends largely on responsible, talented, and creative people throughout the Federal Government to make it an effective information dissemination tool. The quality of the records depends on the people who make the decisions about what to include in each record. Each government agency is responsible for compiling and maintaining its own records in GILS and, since no single registry of existing agency GILS implementations exists, determining which agencies now implement GILS is difficult. Recent counts show that nearly fifty Federal agencies make information accessible to the public using GILS. Representative GILS records locate information created by the Department of Commerce, Department of Treasury, Government Printing Office, Social Security Administration, Office of Personnel Management, and the Federal Trade Commission, among others, and, as of March 1997, some 5,000 locator records had been created by Federal Government agencies. These records point to electronic sources of information such as web sites, as well as people, textual resources, artifacts, events, meetings, and individual government reports. Some agencies have elected to place their GILS records on agency servers while others have elected to contract the work to either the Government Printing Office (GPO) or FedWorld. Using the Government Printing Office service called GPO Access, a user can submit a search across one or more agency GILS. The user is presented a list from which to choose, selects which agency GILS databases to search, submits the query, and the search is broadcast to the selected GILS databases and servers. Cross agency searching is accomplished by using pointers from the GPO server to other agencies’ servers, at least in those cases where the agencies have supplied pointer records to the GPO. GPO also offers searches on a database compilation of Privacy Act Notices provided by National Archives and Records Administration (NARA). The provision of this database meets the mandated requirement in the Bulletin to create GILS records for agency Privacy Act Systems.

GILS uses a standardized search and retrieval protocol, ANSI/NISO Z39.50, as a standard mechanism for interoperable search and retrieval, and this choice has proven to be a strength of GILS. Some agencies have developed locator records at the item level (e.g., a record for an individual report), while others have developed records at a broader level (e.g., a record for an entire Web site). This unevenness in coverage is a weak point of GILS. The National Archives and Records Administration produced guidelines for creating GILS records to facilitate creation of high quality information inclusion in GILS (National Archives and Records Administration, 1995). These guidelines pertain only to Federal agencies at the Cabinet level, in terms of compliance enforceability, but they are useful to all agencies in developing a systematic appraisal process for determination of what coverage level GILS should provide.

Metadata, in the context of GILS, is a set of standardized elements that can be used to describe government agency information resources, serve as surrogates for those resources, and support networked information discovery and retrieval. When users search for information they are presented with a display record that contains metadata elements. Searchers then use the information presented in the GILS record display to access or acquire the information resource. A representative GILS record contains GILS Core metadata elements such as those listed below from Version 2 of the Application Profile:

TitleControl IdentifierAbstract
PurposeOriginatorAccess Constraints
Use ConstraintsAvailabilityPoint of Contact
Sources of DataData last modifiedContributor
Language of ResourcePlace of PublicationDate of Publication
AbstractControlled Subject IndexSubject Terms Uncontrolled
Spatial DomainTime PeriodMethodology
Use ConstraintsAgency ProgramSupplemental Information
Schedule NumberCross ReferencesOriginal Control Identifier
Record SourceLanguage of RecordRecord Review Date

William E. Moen and Charles R. McClure (Moen & McClure, 1997) conducted a detailed evaluation study of the early implementation of GILS at the request of the GILS Board. At its first and only meeting in December of 1995, John Carlin, the Archivist of the United States, proposed this evaluation study to the GILS Board and it was approved. The study began in September of 1996 and ended in March 1997. This study evaluated how GILS serves various user groups, improves public access to government information, and works as a tool for information management. It assesses how agencies are progressing with their implementation. The resulting report covers five aspects of GILS: users, policy, technology, content, and standards. The report finds that the goal of producing a government-wide information locator service has not yet been achieved and that many of the difficulties in the implementation are a result of lack of focus and commitment to supporting public access to information within government agencies. They found that users are confused and disappointed with the GILS implementation because of the high degree of user sophistication required to exploit GILS, and that users are interested in and expecting to find full-text information when using GILS. Since GILS records are sometimes difficult to read and interpret when displayed and often are not linked to actual sources identified, users are disappointed. The report findings indicate that GILS does not support electronic records management activities, an important original objective. Because of the decentralized nature of GILS, the services available to users are qualitatively and quantitatively uneven, and that adds to user frustration and disappointment. They recommend that GILS be refocused and that consensus building efforts be used to shape the purposes and goals of a government-wide GILS; that authority, accountability, and responsibility for the government-wide locator initiative be clearly established; and that measurable objectives for GILS be established. They recommend ongoing, continuous evaluation, particularly in user satisfaction and compatibility with new technical developments.

One specific area of GILS evaluation is the assessment of the information content in the metadata. The researchers used content analysis methods to systematically examine GILS records in order to assess the quality of the records and the metadata. They used the factors of accuracy, completeness, serviceability, and coverage profile in their assessment of a randomly selected set of records. Moen and McClure measured accuracy by counting spelling and typographical errors and format and formatting errors. They measured completeness by counting the number of elements used per record and the extent of utilization of the core metadata elements. They measured serviceability by assessing uses of capitalization, indentation, definition of acronyms, element display order, file formats, and search options. They measured profile by categorizing the objects represented, objects aggregated, and record types presented. The results of this portion of the study indicate an uneven understanding and appreciation among GILS implementors of the value of metadata to support a distributed information locator service. The records are not consistent in terms of metadata utilization or content quality, and a uniform, coherent government-wide presentation and application of content in metadata does not exist.

Moen and McClure noted that research in networked information discovery and retrieval is only beginning to explore the role of metadata quality, but quality criteria such as accuracy, consistency, completeness, and currency are known to impact user satisfaction in research using library catalogs’ bibliographic records. They also point out that principles of bibliographic control may apply to metadata; however, metadata is different because it represents networked resources that are volatile, distributed, heterogeneous, and exist without the benefit of a single authoritative group to establish and dictate standards and procedures. Standards for metadata in the networked environment are lacking, and the likelihood of gaining consensus on a standard among the many communities using metadata is slim. This may mean that more attention to user expectations and acceptance is needed in forming criteria for the measurement of quality in metadata in the networked environment.

Despite the fact that consistency, rules for implementation, and adherence to guidelines are needed at this early point in the GILS implementation, the potential for distributed access to government information is still viable and GILS should not be considered a failure. GILS represents an ambitious effort to address information access to networked resources and the ways in which the characteristics of information resources are exposed to searching. GILS is a combination of human, organizational, and technological facilities that are intended to help people locate information. GILS is a locator service that identifies other information resources such as people, catalogs, databases, and individual documents and it is positioned to fit into the global information infrastructure as an aid to the free flow of information by taking good advantage of existing networks and software and interoperability standards. Because of this positioning, some people envision that GILS will evolve and will provide direct access to the actual information resources of some records.

Several policy and technical design principles laid out by Eliot J. Christian, a principal designer of the GILS concept, elucidate important positive points about GILS and its future in the global community. In a December 1996 article for "D-Lib Magazine," Christian makes the following suggestions:

GILS is an important example for researchers and implementors to examine. Others can learn from GILS what works and what does not work. In the same way that the KWIC index was quickly adopted and then adapted, the GILS concept is being adopted and adapted in many other government agencies and in other countries. Examples of projects that GILS has influenced include: Locator records are being crafted at all levels of government and the issues of quality, coverage, standardization, and usability confronted at this stage of GILS development will provide an example that will help these agencies identify mistakes to avoid as they work to make more information more accessible to the public.

Dublin Core Metadata Elements

The Dublin Core is intended to be a simple and easy to use set of metadata elements to describe and help locate information resources, especially on the Internet. The problem that the Dublin Core designers address is "harvesting" relevant data from the vast array of networked electronic resources that exist around the world in textual and non-textual form. Search engines return a great deal of irrelevant information to searchers because they have no way to distinguish between significant and incidental words in document texts. People can use metadata to reduce this problem by identifying the major concepts of the information resource to improve precision in the search process. The designers of the Dublin Core wanted to develop a simple core list of elements that identify the different characteristics of a resource such as author, subject, title, publisher, type, and date to be used by information providers and producers to describe their networked resources and improve their chance of discovery by making their work more visible to search engines and retrieval systems. The designers wanted to provide a way to identify information that might otherwise be missed because sites provide images, databases, PDF documents and other objects that are not readily identified by search engines. The Dublin Core has the potential to be used by automated indexing tools to enhance resource discovery.

Librarians, computer scientists, computer networking specialists, museum information specialists, content experts, digital library researchers, and text-markup specialists first met in March of 1995 to address the problem of resource discovery for networked resources using a consensus building process. From the beginning, the participants in the workshops were devoted to the idea that the metadata coding must be simple so that all creators would use it.

Currently, after five workshops, a set of fifteen metadata elements designed to promote interoperability among heterogeneous metadata systems now exists.

The metadata elements fall into three groups that indicate the scope of the information they contain: (1) Content, (2) Intellectual Property, and (3) Instantiation.

ContentIntellectual PropertyInstantiation
TitleCreatorDate
SubjectPublisherType
DescriptionContributorFormat
SourceRightsIdentifier
Language  
Relation  
Coverage  

The element set supports the description of both networked non-textual and textual resources. In addition, the fourth workshop in Canberra, Australia, produced a group of formalized qualifiers called the "Canberra Qualifiers" to address the issue of making elements "richer" and more useful to a particular community of users. The qualifiers address the language of the descriptor field, scheme description, and type or sub-element name. At this time, groups are working on refining the qualifiers. According to Dublin Core workshop participants Stuart Weibel and Juha Hakala (Weibel & Hakala, 1998), standardization of the unqualified set of Dublin Core elements is underway using the process of submitting working draft documents for comment to the Internet Engineering Task Force (IETF) on the following topics:

Dublin Core on the Web: RDF Compliance and DC Extensions

The Dublin Core is not a replacement for existing detailed metadata structures such as USMARC or the Federal Geographic Data Committee (FGDC) metadata standards for digital geospatial metadata. Instead, it is a method of describing the essence of many types of digital and non-digital resources in order to provide a commonly understood universal set of metadata that can improve retrieval effectiveness. The Dublin Core is not intended to be a full and rich resource description tool such as AACR2/MARC; however, it does recognize this and other metadata schemes as important classification schemes and makes room for their use in the Dublin Core metadata. The designers also wanted the semantics to be commonly understood across disciplines. The simple, commonly understood Dublin Core descriptors provide increased prospects for cross-disciplinary searching and identification of resources. As use evolves, users will need accurate data mapping mechanisms, or "crosswalks," between domains with specific metadata standards to help translate index terms, retrieve and display heterogeneous element types, translate data to help users understand search results, and provide context to retrieved information. Some "crosswalks" are being maintained at central and authoritative locations. For example, the Dublin Core documentation is available from and maintained by OCLC and a USMARC "crosswalk" to the Dublin Core is maintained by the Library of Congress.

Dublin Core participants are committed to including an international scope because the resources that are available for discovery span the world. Dublin Core documentation is available in at least nine languages and a working group on multilingual metadata is working on multilingual issues. Dublin Core workshops have been held in Finland, the United Kingdom, the United States, and Australia. Participants are promoting the use of the Dublin Core in their work in other locations around the world such as Sweden, Denmark, Thailand, Japan, Canada, France, and Germany. The Australian Government Locator Service working group has examined the Dublin Core Metadata Element set for adoption in their developing locator service and has recently announced the Dublin Core as the recommended resource description content standard for electronic government documents.

Flexibility is another aspect deemed important by the Dublin Core designers. Many unique and rich resource description schemes exist and the Dublin Core embraces those schemes by defining an architecture that can handle multiple schemes with additional structure and more elaborate semantics. This architecture is called the Warwick Framework. The Warwick Framework also targets the need to provide extensibility so that schemes that have not yet been invented can fit into the architecture. The informal international working group also wanted rapid development on those issues that could be solved with a minimum of argument and they wanted to devise a scheme that could be readily adopted in the standards community. One technique for rapid development is experimental application in the "real-world." At the second Dublin Core workshop in Warwick, England, participants developed a workable syntax for Web-based applications in the "real-world." HTML embedded metadata is not the sole target for Dublin Core use, but workshop participants thought it the quickest way to promote early deployment and encourage a variety of pilot projects across a variety of disciplines. A simple convention, HTML-META, developed in May of 1996, facilitates use of metadata in real applications. Projects using the Dublin Core now exist in at least ten countries and across many disciplines. The Dublin Core is facilitating Interlibrary Loan and Document delivery services among Nordic countries in a project called the Nordic Metadata Project. One of the principle ideas in this project is to enhance end user services by making a diversity of digital documents more easily searchable and deliverable. Another application just underway is the Monticello Electronic Library Project that plans to link distributed regional resources regardless of source or type of information. The Dublin Core Element Set will be used by the participants in this project to provide the semantic interoperability between several databases of electronic media and record types including SGML, MARC, and GILS collections.

The current state of Dublin Core development is reported in the conference report describing DC-5 held in Helsinki, Finland, in October of 1997 (Weibel & Hakala, 1998). The chief results of the ongoing Dublin Core work to date are the completion of the semantics for unqualified Dublin Core elements. This product is called the "Finnish Finish" and forms the basis for beginning the formal standardization of the Dublin Core. Best practices uncovered in the implementation projects will also influence the standardization process. The participants at the Helsinki workshop embraced the Resource Description Framework (RDF) syntax as a promising development that will support a rich architecture for metadata of all types. This architecture is a World Wide Consortium initiative that addresses the diversity of semantics and structure needed in various user communities. RDF will stimulate the development of an underlying data model for the Dublin Core and address problems such as the coherent expression of sub-elements and registration processes for schemes and sub-elements. This direction focuses more experimentation in the Web-based arena; however, many people are interested in non-HTML-based uses of the Dublin Core elements. Examples of these include SGML texts in electronic text centers, citation databases, and OPAC records. These types of information may be provided on the Internet, but not be a part of a static Web page and they may have a fully developed metadata system in place. Internet access to non-HTML based information resources requires continued effort to explore interoperability between metadata schemes in non-HTML environments.

Harold Thiele from the University of Pittsburgh conducted a literature review of the Dublin Core and believes that a crucial period is now approaching (Thiele, 1998). Most of the literature since 1995 is descriptive in nature. He feels that more empirical evaluations of the projects now underway will begin to be published and he suggests that the research will fall into three areas: behavioral, technical, and sociological. User studies on how effective the Dublin Core actually is in comparison with other metadata schemes used by searchers fall into the behavior category. On the technical side, he suggests researchers compare what effects the Dublin Core has on reducing bandwidth problems associated with indexing the Internet and how different search engines retrieve using the Dublin Core. On the sociological side, he suggests that the adoption of the Dublin Core be studied. Will creators use the Dublin Core in non-academic environments and will the use of the Dublin Core become an authenticating device for Internet resources?

The Dublin Core Metadata Element set has reached a milestone with the consensus based adoption of the unqualified core element set by seventy attendees from sixteen countries representing many different resource description communities at the Helsinki Metadata Workshop and the draft Request for Comment papers, but the process is just beginning. As more producers and creators adopt the Dublin Core for use, more useful information will be uncovered about using metadata in networked resource discovery and retrieval. Work will progress toward automatic indexing of this metadata. GILS has provided an example of the need for user input and evaluation early on in the development process. Hopefully, researchers will involve users in evaluations of the Dublin Core early on so that useful and rapid adjustments can be made that will be viewed as effective by the user.

Warwick Framework and the Resource Description Framework (RDF)

The Warwick Framework is a container architecture for networked resource description developed in 1996 at the Dublin Core Metadata Workshop II in Warwick, United Kingdom. The purpose of this workshop was to build on the Dublin Core elements and provide a more concrete and operationally usable formulation of the Dublin Core to promote greater interoperability among content providers, content catalogers and indexers, and automated resource discovery and description systems. The Warwick Framework addresses the need for a universal common scheme for description of the information and the fact that very rich and powerful metadata schemes have already been developed and are in wide use. The developers of the Warwick Framework aimed to find a way to use all metadata schemes and vocabularies to aid in information retrieval. The concept and definition work provided by the Warwick Framework designers and the Dublin Core participants is being incorporated into the development of the Resource Description Framework, an initiative of the primary standards forum for the Web, the World Wide Web Consortium (W3C). The Dublin Core provides the semantic focus for RDF, and in turn RDF clarifies the importance of a coherent underlying data model for the Dublin Core metadata.

The Warwick Framework container architecture concept is a mechanism for aggregating logically, through use of specific data structures, distinct packages of metadata. Doing this allows the designers of metadata sets to focus on their specific requirements, allows the syntax of metadata sets to vary, distributes management of metadata sets to communities of expertise, promotes interoperability and extensibility by allowing tools and agents to selectively access and manipulate some packages of metadata while ignoring others, permits different sets of metadata related to the same object to be separately controlled, and accommodates future metadata sets by not requiring changes to existing sets or programs that make use of them.

The Warwick Framework has two parts: (1) the container and (2) the packages, the metadata sets (Lagoze, Lynch, & Daniel, 1996). The container may be either transient or persistent. In transient form, it is a transport object between and among repositories, clients, and agents. In persistent form, it is a first-class object in the information infrastructure using a globally accessible identifier such as a Universal Resource Identifier or URI. The container also has the property of being able to be wrapped within another object and, in that case, the wrapper will have a URI rather than the metadata container. Within the container any package may appear before another package and no order of dominance is intended. Each package is an opaque bit stream and some packages can be skipped over if they are unknown or encrypted. The client or agent determines the type of package it is confronting at the time of access. The package may be a metadata set containing actual metadata, an indirect set containing indirect reference to another object in the information infrastructure, or a container. The Warwick Framework points to a need for a registry of types so that the client can determine the type of package it needs to deal with at the time of access.

The Warwick Framework, which began to promote semantic interoperability across disciplines and languages and to define mechanisms for extensibility to support richer descriptions and linkages to other description models, has heavily influenced the ongoing work of the World Wide Web Consortium (W3C). This consortium is the leading developer of common protocols to enhance interoperability on the World Wide Web. Their mission is to lead the evolution of the World Wide Web. The W3C is designed to be agile and technically sound by representing the power and authority of hundreds of developers, researchers, and users in its membership. The consortium is hosted by research organizations and able to leverage the most recent advances in information technology. The W3C began at the Massachusetts Institute of Technology in the Laboratory for Computer Science in 1994, developed a presence in Europe in 1995 in partnership with France’s National Institute for Research in Computer Science and Control, and added an Asian host, Keio University in Japan, in 1996. Over 220 commercial and academic members worldwide contribute to this consortium to address Web-related issues that span topics from social change to application services. The consortium provides a number of public services including a repository of information about the World Wide Web, sample code implementations to embody and promote standards, and various prototypes and sample applications to demonstrate the use of new technology.

Metadata is a large activity area for the W3C. The consortium is redefining metadata from the common definition of "data about data" to its own specific Web interests. The W3C definition of metadata is "machine understandable information about information on the Web" (Kotok, 1998). Side by side with metadata working papers, visitors to the W3C Web site see working papers discussing labeling using Platform for Internet Content Selection (PICS) and PICS-NG (PICS Network Graphics), Dsig or Digital Signatures, eXtensibile Markup Language (XML) Data Specifications, Dublin Core, and RDF. The consortium views metadata as the architectural underpinning of trust in society, and therefore categorizes work on digital signatures, privacy protection, intellectual property rights, and the Resource Description Framework together.

RDF is a framework for metadata intended to provide interoperability between applications that exchange machine-understandable information on the Web. RDF emphasizes automated processing of Web resources and resource discovery techniques to enhance search engine capabilities, provide better descriptors of content, and describe property rights, rating content, and/or digital signatures. RDF is intended to provide the basis for generic tools for authoring, manipulating, and searching machine understandable data on the Web and transform it into a machine processable repository of information. It is a collaborative effort that draws upon the XML design substantially contributed by Microsoft, the Dublin Core and Warwick Framework work, and the Netscape model. It is expected to provide a common platform on which metadata in many vocabularies can be expressed. The Channel Definition Format (CDF) is one of the vocabularies or applications that is suited to RDF.

This work is in progress, and not much is publicly available at this time. Rachel Heery, a noted researcher at the United Kingdom Office for Library and Information Networking (UKOLN), describes the "members only" confidentiality rules that are a part of the W3C and then makes some educated predictions about RDF (Heery, 1998). First, it is designed narrowly with only the Web in mind, and, second, it is a syntax based on a data model that influences the way properties are described because it makes the structure of the descriptions explicit. Heery feels that this indicates that RDF will be useful for describing Web resources, but may not be useful for dealing with legacy schemes such as the MARC format. RDF does not contain any predefined vocabularies for authorizing metadata, but the Dublin Core is viewed as promising because of the notice it has already received and because Dublin Core representatives are actively involved in the RDF development effort. She goes on to explain that the RDF model is syntax independent, but can be expressed in XML, a sub-set of SGML. HTML is also a sub-set of SGML; however, XML is an extensible syntax while HTML is a non-extensible syntax.

RDF and XML are intended to be complementary. XML allows locally defined tags to be created and offers the possibility for building products based on exchanging structured data between applications. XML is designed to facilitate the creation of metadata and RDF provides the syntax for applications to recognize and exchange metadata for Web objects. RDF data consists of nodes and attached attribute/value pairs. Nodes can be any Web resource or anything for which you can give a URI or namespace or other instances of metadata. Attributes are named properties of the nodes and their values are either atomic (strings, numbers) or other resources or metadata instances.

Ralph Swick is working on the Resource Description Framework. He contends that once the Web is populated with rich metadata, searching on the Web is expected to be easier as search engines have more focused information available. He predicts that automated software agents will roam the Web looking for information to transact on an individual’s behalf. His prediction is that the vast unstructured mass of data we experience today will become more manageable and more useful to us because we will be able to more precisely target useful information (Swick, 1998).

Conclusion

Hans Peter Luhn contributed greatly to the field of information retrieval by creating breakthrough applications such as the KWIC indexing method using machines that were able to take words directly from documents and present them to the user in formats that could easily be read and in arrangements that allowed for grouping of keywords together. He provided statistical methods to select the frequently used words and phrases to portray the contents of a document using auto-abstracting. His work and the work of many others over the past forty years has contributed to our quest to be able to quickly, accurately, and consistently retrieve relevant information to meet users’ needs and requests.

In today’s networked environment, GILS is an example of an early implementation of a locator service that attempts to make government information accessible to people. GILS uses a structured list of metadata elements to present information to users to help them evaluate the relevance of the resource and lead them to it. GILS is an example of an approach to making massive amounts of information available and it illustrates both good and bad implementation ideas. GILS demonstrates that people need clear guidance, direction, and standards to help them select items appropriate for a locator record and that the quality of the locator service is highly dependent on what metadata elements people choose to use and the care that is used in preparing and arranging the information. GILS also demonstrates that users expect to have access to item level information when using a locator service in a networked environment.

The Dublin Core is an example of an international consensus building process that has produced a product now ready to begin the standardization process with the Internet Engineering Task Force and contribute actively in the standards forum for Web development at the W3C. The process has also spawned a number of experimental projects that are helping to develop best practices in networked resource discovery and descriptive accounts of how this metadata scheme works across a variety of disciplines. The Dublin Core Metadata Element set of fifteen named elements and the discussions that have accompanied their development have generated interest in the broader Web community, most notably the World Wide Web Consortium, but have not yet been tested for acceptance and ease of use in the broader environment with widespread use by authors and publishers.

Work on the Warwick Framework done at the 2nd Dublin Core Metadata conference has influenced the ongoing W3C development initiative for the Resource Description Framework which will address metadata vocabularies and diverse needs in the vendor community. The W3C research has the potential to produce broader and more widespread interest across the W3C membership and result in creative applications that can be placed in widespread use.

The problem of selecting only a few tags to represent the content of a document or locator service is an indexing problem. GILS represents a metadata element set with over 150 elements while Dublin Core has held to its original plan to incorporate simplicity and has 15 elements. It is difficult to represent the content with descriptions limited to a few tags and it is impossible for the searcher to know which words, images, or terms the indexer chooses. Strategies to minimize the mismatch between the indexer or tagger and the searcher are an information retrieval problem that has been with us for a long time.

Stevens concluded in her report on automatic indexing that, yes, indexing by machine is possible, but it does not necessarily capture the specific content of a document. Is what it does do good enough? Yes, it is good, but can be better with further development, particularly in areas that involve user evaluation in the development process. The problem of appropriate selection of words and perhaps images that can portray content is at the heart of the matter and human indexing methods face the same problem. Machine retrieval adds speed to the process and that is of great practical use today, just as it was when Luhn developed auto indexing and auto abstracting techniques using machines for the first time. The core problem remains one of human decision making, finding and conveying meaning in communication, and judging relevance to particular human needs.

Usability and more user-centric concerns must take a lead in the development process. It is clear from the early experiences with metadata schemes that people and their intellectual processes are key to the construction of useful efforts at the start of the project and in the ongoing implementation. The more we learn about the process through experimentation, the deeper our understanding grows about the role of the human in the process of applying these schemes and the potential for refined automatic search and retrieval of locator and descriptive information that can lead to growing access to stores of information around the world. As Tefko Saracevic has stated,

"The success or failure of any interactive system and technology is contingent on the extent to which user issues, the human factors, are addressed right from the beginning to the very end, right from theory, conceptualization, and design process on to development, evaluation, and to provision of services" (Saracevic, 1997).

References

Christian, E. J. (1996, December). GILS: What is it? Where is it going? D-Lib Magazine. [On-line]. Available: http://www.dlib.org/dlib/december96/12christian.html.

Heery, R. (1998, March). What is...RDF? Ariadne. (14) [On-line]. Available: http://www.ariadne.ac.uk/issue14/what-is/.

Kotok, A. (1998). The Technology & Society Domain. [On-line]. Available: http://www.w3.org/TandS/Overview.html.

Lagoze, C., Lynch, C. A., & Daniel, R. Jr. (1996). The Warwick framework: A container architecture for aggregating sets of metadata. [On-line]. Available: http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Body/ncstrl.cornell%2fTR96-1593/html.

Moen, W. E. & McClure, C. R. (1997). An evaluation of the Federal government’s implementation of the Government Information Locator Service (GILS): Final report. (1997, November 24). Version 2 of application profile for the Government Information Locator Service (GILS). [On-line]. Available: http://www.usgs.gov/gils/prof_v2.html.

National Archives and Records Administration. (1995). The Government Information Locator Service: Guidelines for the preparation of GILS core entries. Washington, DC: National Archives and Records Administration. [On-line]. Available: http://www.dtic.mil/gils/documents/naradoc/.

National Institute of Standards and Technology. (1994). Federal information processing standards publication 192, application profile for the Government Information Locator Service (GILS). Federal Register, 59 (December 7): 63075-63077. [On-line]. Available: http://www.dtic.dla.mil/gils/documents/naradoc/fip192.html.

Saracevic, T. (1997). Users lost: Reflections on the past, future, and limits of information science. Communications of theACM 31(2), 16-27.

Stevens, M. E. (1970). Automatic indexing: A state-of-the-art report. National Bureau of Standards Monograph 91 reissued with additions and corrections (SD Catalog No. C13.44:91). Washington, DC: U.S. Government Printing Office.

Swick, R. (1998). Resource description framework (RDF). [On-line]. Available: http://www.w3.org/RDF/

Thiele, H. (1998, January).The Dublin core and Warwick framework: A review of the literature. D-Lib Magazine. [On-line]. Available: http://www.dlib.org/dlib/january98/01thiele.html.

U. S. Office of Management and Budget. (1994, December 7). Office of Management and Budget bulletin 95-01: Establishment of the Government Information Locator Service. Washington, DC: Office of Management and Budget. [On-line]. Available: http://www.fedworld.gov/gils/omb95-01.htm.

Weibel, S. & Hakala, J. (1998, February). DC-5: The Helsinki metadata workshop. D-Lib Magazine. [On-line]. Available: http://www.dlib.org/dlib/february98/02weibel.html.

horizontal rule

Return to Table of Contents

This page is created and maintained by Sue Soy ssoy@ischool.utexas.edu
Last Updated 11/11/98
© Copyright 1996 Susan K. Soy
Please feel free to copy and distribute freely for academic purposes with this notice and attribution.
All other rights reserved.