Graduate School of Library and Information Science, UT Austin
Information Technologies
and the
Information Professions
spacer


Shortcuts
Home
Introduction
Syllabus
Texts
Tech Modules
Assignments
Standards
Grading
Resources
Blackboard
Contact Info
 
iSchool Links
ISchool Home
Tutorial Junction
IT Services
   
UT Links
UT Home
Library Home
UTNetCAT
Address Change
 

Markup Languages
by Quinn Stewart

In the beginning-

Although HTML is the best known application of markup languages, the concept is nothing new. The evolution of the movable-type printing press created new industries in printing and publishing, and brought with it the instigation of "markup" tags or languages.  These were originally marks written on manuscripts and drafts giving instructions to the printer for styles and sizes of font to use, instructions for italic or boldface type, or other instructions on how the finished document should appear.  This is the basic format of markup languages--a set of instructions that indicate how a final document will look.  As computers and computer-based text processing have increased in importance and scope, markup languages have kept pace.  Word-processing programs use hidden markup languages; the keystrokes or toolbar commands for boldface text or different sizes operate in much the same way as old fashioned printers' marks, even though they are not visible to the user.  These kinds of markup tags became known as "procedural markup," a term still in use.  Web-based documents, however, have provided the greatest exposure for markup languages, and brought them into the sphere of public knowledge.

GML and SGML

Beginning in the 1960s, a group of programmers at IBM conceived of a standardized method of creating markup tags for computers, a common standard that could accommodate different types of documents, created on different types of computer platforms (a significant problem until very recently).   Their result was the Generalized Markup Language (GML).  GML was initially focused on legal documents, but over time evolved into the Standard Generalized Markup Language (SGML).

SGML is "a set of rules for defining and expressing the logical structure of documents thereby enabling software products to control the searching, retrieval, and structured display of those documents" (http://www.loc.gov/ead/eadback.html), and is the foundation of much of what later became the World-Wide Web.  SGML is intended to focus on document structure rather than simply appearance, which gives it the potential for future expansion as computer-based communications continue to evolve. It was officially approved by the International Organization for Standardization (ISO) in 1986, and remains an accepted standard. However, SGML is intended as a basis for creating languages rather than as a means of Web publishing. An apt analogy is: "Think about building a model airplane. If that airplane were a document, a markup language would be used to put it together. SGML would be used as the basis for the assembly instructions, not the assembly process itself." (Navarro, White, and Burman, 1999).

HTML

Hypertext Markup Language (HTML) is a greatly simplified version of SGML, and intended specifically for the Web.  At the beginning of the creation of the World-Wide Web, many different computer platforms and document encoding standards existed.  Web creators recognized the need for a common descriptive language to help alleviate these problems.  HTML was developed from SGML to facilitate the use of the "hypertext" environment, where links can be made from one document to another without the need to navigate any hierarchical organization of documents.  HTML was deliberately envisioned to be simple, to allow its widespread use.  Although HTML was originally intended for use in UNIX environments, it lent itself well to the newly-created Mosaic graphical interface "browser," and soon became the standard for the Web.   As the Web has grown in size and sophistication however, the limitations of HTML are becoming apparent.  To address these limitations and to return to the structural focus of the original SGML standard, the EXtensible Markup Language (XML) is being developed.

XML

XML differs from HTML in at least one fundamental way. With HTML, all the tags used in a document must be introduced to and approved by the World-Wide Web Consortium, or W3C.  In the early days of HTML, the main focus was on publishing documents on the Web. As the Web evolved, developers wanted more control over the layout and format of documents, and there have been at least 3 major revisions of HTML since 1992. In their quest to fulfill the needs of users and gain market share, both Netscape and Microsoft introduced tags specific to their HTML browsers, in the hope that these "improvements" would then be adopted by the W3C. This has led to an increasingly bloated set of tags for HTML, as well as ongoing browser incompatibilities.

XML returns to a more SGML-like approach. Rather than specifying what each tag and attribute means like HTML, XML uses tags to dictate the structure of the document, not its display characteristics. It leaves this interpretation up to the application that is reading it. For example, the <b> tag in HTML denotes that the text enclosed by this tag be rendered in bold. In XML, depending upon the defined context, <b> may mean the enclosed text is broken, brown, bent or borrowed, or anything else defined by XML for the rendering application. The definition and use of the tags is left up to the developer. This makes XML extensible to far more uses than HTML,  since its tags can be used for many more applications than just Web publishing. (For more information, see the W3C's"XML in 10 Points")

Transitioning from HTML to XML-

XHTML

Although it is inevitable that XML will replace HTML eventually, the transition will take time. In the interim, EXtensible Hypertext Markup Language (XHTML) will provide the "bridge" between them. HTML will not disappear, but the World-Wide Web Consortium (W3C) will no longer update it. XHTML is based on XML, and permits site designers to easily add new tags and extensions to their languages. Additionally, XHTML is intended to facilitate Web access by nontraditional agents, such as personal digital assistants. Since the transition from HTML to XML will evolve over time and must accommodate platforms and interfaces not yet in existence, the W3C expects to update XHTML for some time. As a hybrid, XHTML will be compatible both with HTML-based Web pages and those created in XML. This will allow designers to begin creating sites which are XML-based, but still compatible with the current generation of Web browsers.

SMIL

Synchronized Multimedia Integration Language (SMIL) is an XML markup language created to allow independent multimedia objects to be synchronized into a multimedia presentation. These objects can be audio, video, animation, images, text etc. The development of SMIL by the W3C is an interesting story, and a microcosm of the development of many of the specifications of the W3C. When the SMIL 1.0 specification was under development in 1997, Microsoft was one of the members of the working group developing the specification, along with RealNetworks, Macromedia, and Apple. At that time, and to this day, these companies represent the major competitors for distributing audio and video content on the Web. By the time the first specification was released in 1998, Macromedia and Microsoft had discontinued their participation in favor of their own proprietary formats. However, once RealNetworks and Apple implemented the SMIL specification in their products, the world-wide development community rapidly began to adopt the specification. Realizing the error of their ways, Macromedia and Microsoft returned to collaborate with the W3C working group.

Why all the fuss? SMIL was designed to be a simple text-based markup language similar to HTML. It has two major tags, <par> and <seq> which mean parallel and sequential. Basically, a <par> tag means to play multimedia objects in parallel, a <seq> tag means to play them in sequence. This simple language can create elaborate multimedia presentations with minimal programming, and because it is based on XML, different vendors can extend the language to fit their needs.

Multimedia is big business, especially interactive multimedia. As technologies improve and bandwidth increases, SMIL could easily evolve into a multimedia-on-demand system, with the ability to combine many different multimedia objects around the world into interactive presentations. Once RealNetworks and Apple demonstrated the power of the technology, Microsoft and Macromedia realized that while they could retain their proprietary technologies, the global community might not choose to use them, and instead embrace the work of the W3C.

In late 2000, both RealNetworks RealPlayer and Apple's Quicktime Player are SMIL-compliant, and Microsoft has begun implementing the SMIL 2.0 recommendations into Internet Explorer 5.5.

EAD

Encoded Archival Description (EAD) is one of the markup languages specific to the Library and Information Science discipline, specifically archives.  It is based on SGML, which provides both flexibility in document description and an open-ended standard that can accommodate both technological advances and the specific needs of archivists. It has not yet been formally adopted as the definitive standard for Archival description, but is expected to be in the near future. (For more information, see the Library of Congress Website: http://lcweb.loc.gov/ead/.

Sources:

Goldfarb, Charles F. (1996). The Roots of SGML -- A Personal Recollection. http://www.sgmlsource.com/history/roots.htm

The Library of Congress. Encoded Archival Description Official Web Site. http://lcweb.loc.gov/ead/

Navarro, Ann, White, Chuck & Burman, Linda. (1999). Mastering XML. Sybex, Inc.

Richmond, Alan. (2000). Introduction to XHTML, with eXamples. http://wdvl.com/Authoring/Languages/XML/XHTML/

Society of American Archivists. EAD Help Pages. http://jefferson.village.virginia.edu/ead/

The World-Wide Web Consortium http://www.w3c.org

 

curve image  
Course emailbox: l38613dw@ischool.utexas.edu
iSchool Website: www.ischool.utexas.edu

Last updated 2002 Sep 12 by R. E. Wyllys