THE UNIVERSITY OF TEXAS AT AUSTIN
SCHOOL OF INFORMATION


LIS 386.13 (known as INF 380K, beginning with the Fall Semester 2003)
INFORMATION TECHNOLOGIES AND THE INFORMATION PROFESSIONS
R. E. Wyllys

MARC Records and Variable-Length Record Structures


Introduction

This lesson discusses the system known as MARC, the principal means in many countries for computer-assisted handling of the bibliographic records of InBEs (information-bearing entities) such as books, serials, and other materials used in libraries and other information agencies. Our main emphasis herein is the structure of the MARC record, i.e., the computer files by which MARC information is actually handled.

The MARC System

MARC stands for MAchine-Readable Cataloging. The MARC system originated at the Library of Congress (LC) in 1965, as a part—probably the single most important part—of the beginnings of the automation of libraries in the U.S. and elsewhere. These beginnings coincided with the arrival of business-oriented computers that could perform various tasks at costs low enough to attract a much wider audience of consumers than in years prior to the mid-1960s.

Before we consider MARC in detail, we pause for a moment to put MARC into the time-frame of one human being's professional career—a career that spanned vast changes in the world of libraries. In "The Library Bulletin" of the UT-Austin General Libraries for February 23, 2001 (Vol. XX, No. 2, pp. 7-8) appeared an item, written by Peggy Mueller, that stated in part:

Fleetwood Giles, Cataloger, Cataloging Department, retired January 31, 2001, after forty-six years of University of Texas employment. Fleetwood received a B.A. (pre-med) from UT Austin in 1950, studied toward a B.S. in secondary education and then completed the M.L.S. in 1958 from UT-Austin. He joined the General Libraries in 1954 and was promoted to a professional librarian position in 1957. . . . As a cataloger Fleetwood's career began with 3"x5" paper cards, electric erasers, and manual typewriters and progressed through work forms to online cataloger workstations, international bibliographic databases and much, much more.

Having noted the extent of change in cataloging over the past four decades, let us return to MARC itself. Building on a pilot project during 1965-1967, the Library of Congress settled in 1968 on a form of computerized recording of cataloging information, the MARC II record. This was the foundation for the current record, called MARC 21, which is essentially the MARC II record with some added features.

The MARC record is a computer-readable and -manipulable record of cataloging information for information-bearing entities (InBEs), such as books, serials, etc. The MARC system moves cataloging information among the institutions that participate in the system, which consists of:

MARC systems exist in several countries, with minor differences to accommodate local needs. What is used in the U.S. is often called USMARC to identify it with this country. Examples of other MARC systems are CAN/MARC and UKMARC, the Canadian and U.K. systems, respectively.

The Nature of the MARC Record

Fixed-Length Computer Records

A typical computer file consists of a set of records of equal length: i.e., records such that

Note that these restrictions allow different types of fields to have different numbers of bytes. For example, suppose that each record contains a Social Security Number (SSN) field and a telephone-number field (plus other fields that we ignore here). Each SSN field will consist of 9 bytes; each telephone-number field, of 10 bytes. The kind of structure we have just described is called a "fixed-length" record.

Fixed-length records work well for many applications. For example, consider a file used in a company's accounting department to contain the information needed to prepare employees' paychecks. The records in such a file would store information for the individual employees of the company. Each record would need fields that contain such information as SSN, hourly wage rate, number of income-tax withholding deductions claimed, number of hours worked in the current week, number of hours of overtime worked in the current week, total of wages paid to date in the calendar year, and total withheld to date in the calendar year. Each such field will have a fixed length appropriate to the nature of the information in the field; each record will contain the same number of fields; and, hence, each record will be of the same fixed length as every other record in the file.

However, it should be clear that there are types of information that do not fit neatly in fixed-length fields. For example, the title of a book can vary in length anywhere from one byte to hundreds (or even thousands) of bytes; the surname and the first name(s) of an author can vary ("Ann Lee" is much shorter than "Gustaf-Adolphus von Sachsen und Coburg"), and a book (in LC cataloging practice) can have 1, 2, or 3 authors (i.e., there can be a need for multiple fields for authors' names). As a moment's reflection will show, these examples indicate that there can be serious problems in using fixed-length records to handle certain types of data.

How could one design a title field for a fixed-length record for book data? Suppose we know that some book titles can be as long as, say, 1492 characters. If we decide to provide 1492 bytes in a fixed-length field for titles, then the vast majority of titles, being much shorter than 1492 bytes, will occupy only a small portion of the title field, the rest of which will have to be filled with space characters. For most records, this would be a great waste of computer-storage space and communications time. On the other hand, if we decide to provide fewer than 1492 bytes for the title field, say 100 bytes, then we encounter another problem: viz., although most titles will fit into a 100-byte field, there will still be some wasted space with many titles, and, worse, some titles will have to be truncated to their first 100 characters (including space characters). (Furthermore, even the space-wasting 1492-byte field might turn out to be too short for an extraordinary title.)

The same problem, and a related one, arise with the author field. First, it is clear that the varying lengths of authors' names present the same problem as that of varying lengths of titles. But there is a second problem, which stems from the fact that there can be 1, 2, or 3 authors of a book. If we include 3 author fields in every fixed-length record for a book, then much of the time, there will be nothing in the 2nd author field and the 3rd author field but space characters.

Variable-Length Computer Records

When the staff of the MARC pilot project began, in 1965, to consider how to handle catalog data in computers, they immediately encountered the problems we have just outlined. Furthermore, at that time, almost all computer files that had ever been designed or used were of the fixed-length-record type. The MARC designers came up with a then-novel solution: the variable-length record.

There are two basic ways of designing a variable-length record for computer use. The first way is to mark, or delimit, the beginnings and endings (or, at a minimum, either the beginning or the ending) of fields and records by special characters that are reserved for that purpose. (Note: Almost all computer files, whether of fixed-length or variable-length type, employ a special character to mark the end of the file. And many fixed-length-record computer files use special end-of-record characters for convenience and as a safety measure against error.) In order for a computer program to use a file of variable-length records with variable-length fields (and, possibly, of varying numbers of occurrences of a given field), the program must, as it opens the file, examine each successive character in the file to determine whether the character is one of the special end-of-field or end-of -record delimiters. Whenever a character is found to be a delimiter, the program knows it has finished inputting a field or a record, and the program must take steps to handle the field or record appropriately.

The second way of designing a variable-length record is to include, at the beginning of each record, a special field, of fixed length, in which the lengths of all the variable-length fields in the record are specified, but to use no special end-of-field or end-of-record characters. This special field, usually called the "header", must itself be of fixed length so that the program can quickly establish the nature of the structure of the whole record, including its variable-length parts, by examining the contents of the header. Often the header, since it is of fixed length, will also include certain fields that are known always to be of a fixed length (e.g., 4 digits for a year).

The MARC record uses both these ways of dealing with records of catalog information. Before we consider at the MARC record format, however, we shall look at a example of each way of handling variable-length records.

Example of Variable-Length Record Structure Using Delimiters

Suppose we have some information about three companies, including their addresses, and our contacts in the companies. Here are the data as we might write them on pages in an address book.

          IBM Corporation
          11400 Burnet Road
          Building A1
          Austin, Texas 78758
          Contacts: Sam Robertson

Big-Bang Startup Company
          10 W. Martin Luther King Jr. Boulevard
          Austin, Texas
          Contacts: Stephen Hawking

          ABC Company
          123 Main Street
          Pocahontas, Iowa 50747
          Contacts: Joe Smith, Jane Roe, Mary Fulano, John A. Doe

Next, suppose we decide to store these data in a computer file using a variable-length structure. First, we display the overall structure of each record, then the delimiters we shall use, and, finally, the foregoing data after being placed in the file.

Record Structure

COMPANY_NAME a variable-length field
  ADDRESS a variable-length field that may be repeated as many times as necessary
  CITY a variable-length field
  STATE a variable-length field (state names are used, not their abbreviations)
  ZIP a variable-length field (since it can be either 5 or 9 digits in length)
  CONTACT_NAME a variable-length field that may be repeated as many times as needed
Delimiters
  « beginning of file
  » end of file
  ƒ beginning of field
  ^ end of field
  beginning of subfield, i.e., beginning of one occurrence of a repeatable field
  end of subfield, i.e., end of one occurrence of a repeatable field
  ~ beginning of record
  § end of record
Sample File of Data Stored as Variable-Length Records Using Both Beginning and Ending Delimiters

«~ƒIBM Corporation^ƒ‡11400 Burnet Road†‡Building A1^ƒAustin^ƒTexas^ƒ78758^ƒSam Robertson^§~ƒBig-Bang Startup Company^ƒ10 W. Martin Luther King Jr. Boulevard^ƒAustin^ƒTexas^ƒ^ƒStephen Hawking^§~ƒABC Company^ƒ123 Main Street^ƒPocahontas^ƒIowa^ƒ50574^ƒ‡Joe Smith‡†Jane Roe‡†Mary Fulano†‡John A. Doe^§»

Note: In the second record, that for Big-Bang Startup Company, there is no ZIPcode. Its absence is shown by the use of adjacent beginning-of-field and end-of-field delimiters, "ƒ^".

Next, we observe that there are actually some unnecessary delimiters in the above example. For instance, the physical beginning of a file will be identified by whatever computer operating system is being used, so that our use of an explicit beginning-of-file delimiter is superfluous, and we may omit it. But, of course, once a program starts looking at the contents of a file, it is important for the program to be able to identify the end of the file, so we will not omit the end-of-file delimiter.

In similar fashion, we can observe that it is really not necessary to mark both the beginning and the ending of each record. The beginning of the very first record in the file must coincide with the beginning of the file itself; and the beginnings of second and later records in the file must occur immediately after an end-of-file mark. Thus, we may omit the beginning-of-record delimiters provided that we retain the end-of-record delimiters.

Again in similar fashion, we can note that it is unnecessary to mark both the beginning and ending of each field. The beginning of the first field in a record must coincide with the beginning of the record itself, and the beginnings of second and later fields in the record must occur immediately after an end-of record mark. Thus, we may omit the beginning-of-field delimiters provided that we retain the end-of-field marks.

Finally, in somewhat similar fashion, we can note that it is unnecessary to mark both the beginning and ending of each subfield. We could reason, in the fashion we have been using, that the beginning of the first subfield in a field must coincide with the beginning of the field itself, and that the beginnings of second and later subfields in the field must occur immediately after an end-of subfield mark. However, we could also reason that the ending of the first subfield in a field must occur immediately before the beginning of the second subfield; that the ending of the second subfield in a field must occur immediately before the beginning of the third subfield; and so on for further subfields. This indicates that it would be sufficient to use just beginning-of-subfield delimiters and to omit end-of-subfield delimiters. (In fact, this is what the MARC record format does.)

Here is the example we used above, except that this time, in keeping with the foregoing reasoning, we have omitted the beginning-of-file delimiters, beginning-of-record delimiters, beginning-of-field delimiters, and end-of-subfield delimiters, with the result shown below.

Minimal Set of Delimiters
  » end of file
  ^ end of field
  beginning of subfield, i.e., Beginning of one occurrence of a repeatable field
  § end of record
Sample File of Data Stored as Variable-Length Records Using a Minimal Set of Delimiters
IBM Corporation^‡11400 Burnet Road‡Building A1^Austin^Texas^78758^Sam Robertson^§Big-BangStartup Company^‡10 W. Martin Luther King Jr. Boulevard^Austin^Texas^^Stephen Hawking^§ABC Company^‡123 Main Street^Pocahontas^Iowa^50574^‡Joe Smith‡Jane Roe‡Mary Fulano‡John A. Doe^§»

The above example uses delimiters in a fashion quite similar to that of the MARC record format.

Example of Variable-Length Record Structure Using Header Blocks

Suppose that we have (partial) cataloging data for two books.

          Rob, Peter; Coronel, Carlos. Database Systems: Design, Implementation, and Management. Course           Technology; 1997. ISBN:0-7600-4904-1.           

Cassel, Paul. Teach Yourself Access 97 in 14 Days. Sams; 1996. ISBN:0-672-30969-6.

Next, suppose we decide to store these data in a computer file using a variable-length structure that employs the header-block approach.. First, we display the overall structure of each record and then the foregoing data after being placed in the file.

Database Structure
Header Block   By design, known to be 29 characters long
  RECORD_ID The ISBN is used in this example.
  COPYRIGHT_DATE  
  TITLE_LENGTH  
  LENGTH_OF_FIRST_AUTHOR_FIELD By LC design, no more than 3 authors
  LENGTH_OF_SECOND_AUTHOR_FIELD  
  LENGTH_OF_THIRD_AUTHOR_FIELD  
  LENGTH_OF_PUBLISHER_FIELD  
Data Block    
  TITLE  
  FIRST_AUTHOR  
  SECOND_AUTHOR  
  THIRD_AUTHOR  
  PUBLISHER  
Sample File of Data Stored Using a Header Block
07600490411997056009014000017Database Systems: Design, Implementation, and ManagementPeter RobCarlos CoronelCourse Technology§06723096961996035011000000004Teach Yourself Access 97 in 14 DaysPaul CasselSams§»
Translation of Sample for Humans

For an example, we use the header block of the first record, in order to show that the header-data string is parsed as though it read:

0760049041 1997 056 009 014 000 017

where the first ten characters are the ISBN (0760049041); the next four characters, the copyright date (1997); the next three, the number of characters in the title (56); the next three, the number of characters in the first author's name (9); the next three, the number of characters in the second author's name (14); the next three, the number of characters in the third author's name (0); and the last three, the number of characters in the publisher's name (17). The second header-data string is parsed in an analogous way.

Note: This example is a simplified analog of the MARC record structure. It shows how, in principle, header blocks of a fixed length can furnish all the information needed for records of varying lengths. The actual MARC record structure combines the header-block structure with field delimiters. The resulting redundancy helps to reduce data errors.

The MARC Format

The actual MARC 21 format (the current version of USMARC) is based on the header-block approach plus some use of delimiters. A MARC record begins with a block that is always 24 characters long and is called the "leader". The characters are numbered starting at 0, so that the leader occupies position numbers 00 through 23 (we use an initial "0" in the numbers of the first ten positions to minimize ambiguity).

Here is what the various positions in the leader of a MARC record mean:

  00-04 Length of the entire record in bytes
  05 Record status (e.g., n = new; c = changed; d = deleted)
  06 Type of record (e.g., a = bibliographic; c = music, printed or microform)
  07 Bibliographic level (e.g., m = monograph; s = serial)
  08-09 These positions are always blank in USMARC records
  10 Indicator count (2 in USMARC records, since all such records use 2 indicators per field)
  11 Subfield code count (2 = number of characters used to identify subfields, the first such character always being ‡)
  12-16 Base address of data (i.e., The location of the first character of the first data field, field 100; e.g., a base address of 00277 would mean that the first data character was in the 278th position from the start of the record, where the character's position number is 277, as noted earlier)
  17 Encoding level (i.e., The level of completeness of the record; "blank" denotes "complete")
  18 Descriptive cataloging form (e.g., a = according to AACR2 [i.e., Anglo-American Cataloging Rules, Revision 2])
  19 Linked record code (blank if no related record; r if a related record exists)
  20-23 Entry map (always 4500 in USMARC; the final "0" has no current meaning, but this position is reserved for possible future use)

You can see that the leader contains two portions (positions 00-04 and 12-16) that deal with the lengths: the first, with the length of the entire record; the second with the number of characters to be negotiated before the beginning of the actual cataloging data. The rest of the leader is given over to codes that can be used in searching for various types of bibliographic data and/or cataloging data. The fixed positions of these codes at the beginning of each MARC record facilitate rapid searching of a file of MARC records.

The leader is followed immediately by a section called the "directory". The directory consists of a number of blocks of data, one for each field in the portion of the MARC record that contains the actual cataloging data. These blocks are all of the same length, but different records can have different numbers of blocks, depending on the numbers of fields in the actual cataloging data. (Typical records often have around a dozen such blocks.) Each block shows the tag of its field, the length of its field, and the starting position of its field relative to the first field in the record. Thus the directory permits rapid location of the start of each field. The end of the directory is indicated by a "^" (the end-of-field delimiter).

Finally, following the directory, come the fields containing the actual cataloging data. These fields start in the character position specified in positions 12-16 of the leader. For the details of how these fields look in the record and how they are used, see the MARC guide by Betty Furrie cited in the second paragraph below.

Closing Remarks

The purpose of this lesson has been to familiarize you with the MARC system in quite general terms, and especially with the sophisticated nature of the structure of computer files used to store and communicate cataloging data for InBEs. You are not expected to learn the fine details of such matters as the detailed uses of the various positions in the leader of a MARC record, but you are expected to learn what the overall structure is like: viz., that it is a combination of the header-block approach and the delimiter approach to handling variable-length records. You are also expected to understand the differences between computer files made up of fixed-length records and those made up of variable-length records, as well as why the storage and communication of cataloging data demands the use of variable-length records.

For further details and examples of how MARC 21 records are used and constructed, I strongly recommend your looking at Understanding MARC Bibliographic: Machine-Readable Cataloging, written by Betty Furrie and made available online by the MARC Office of the Library of Congress. The section of Ms. Furrie's work entitled MARC 21 Reference Materials presents a detailed example of a MARC 21 record.


Last revised 2004 Feb 17