XML & DTDs
Computing Resources >> Tutorials >> Web Development >> XML & DTDs  

Introduction and Objectives

XML Anatomy

Creating a Simple XML Document

"Well Formed" vs. Valid

Is Your Markup Well-Formed?

Creating a DTD

Validating with a DTD

XML Resources

Evaluate this tutorial

PDF Handout

XML Anatomy

If you have ever done HTML coding, creating an XML document will seem very familiar. Like HTML, XML is based on SGML, Standard Generalized Markup Language, and designed for use with the Web. If you haven't coded in HTML before, after creating an XML document, you should find creating HTML documents easy.

Note: If you are interested in learning HTML, please visit one of our following HTML tutorials:

XML documents, at a minimum, are made of two parts: the prolog and the content. The prolog or head of the document usually contains the administrative metadata about the rest of document. It will have information such as what version of XML is used, the character set standard used, and the DTD, either through a link to an external file or internally. Content is usually divided into two parts, that of the structural markup and content contained in the markup, which is usually plain text.

Let's take a look at a simple prologue for an XML document:

<?xml version="1.0" encoding="iso-8859-1"?>

<?xml declares to a processor that this is where the XML document begins.

version="1.0" declares which recommended version of XML the document should be evaluated in.

encoding="iso-8859-1" identifies the standardized character set that is being used to write the markup and content of the XML.

Note: XML currently has two versions out: 1.0 and 1.1. For more information, visit the W3C group, which developed the XML standard. This tutorial deals with primarily with XML version 1.0.

Note: For more information about standard character sets, see http://www.iana.org/assignments/character-sets

The structural markup consists of elements, attributes, and entities; however, this tutorial will primarily focus on elements and attributes.

Elements have a few particular rules:

1. Element names can be any mixture of characters, with a few exceptions. However, element names are case sensitive, unlike HTML. For instance, <elementname> is different from <ELEMENTNAME>, which is different from <ElementName>.

Note: The characters that are excluded from element names in XML are &, <, ", and >, which are used by XML to indicate markup. The character : should be avoided as it has been used for special extensions in XML. If you want to use these restricted characters as part of the content within elements but do not want to create new elements, then you would need to use the following entities to have them displayed in XML:

XML Entity Names for Restricted Characters
Use For
&amp; &
&lt; <
&gt; >
&quot; "

2. Elements containing content must have closing and opening tags.

<elementName> (opening) </elementName> (closing)

Note that the closing tag is the exact same as the opening tag, but with a backslash in front of it.

The content within elements can be either elements or character data. If an element has additional elements within it, then it is considered a parent element; those contained within it are called child elements.

For example,

<elementName>This is a sample of <anotherElement> simple XML</anotherElement>coding</elementName>.

So in this example, <elementName> is the parent element. <anotherElement> is the child of elementName, because it is nested within elementName.

Elements can have attributes attached to them in the following format:

<elementName attributeName="attributeValue" >

While attributes can be added to elements in XML, there are a couple of reasons to use attributes sparingly:

  • XML parsers have a harder time checking attributes against DTDs.
  • If the information in the attribute is valuable, why not contain that information in an element?
  • Since some attributes can only have predefined categories, you can't go back and easily add new categories.

We recommend using attributes for information that isn't absolutely necessary for interpreting the document or that has a predefined number of options that will not change in the future.

When using attributes in XML, the value of the attributes must always be contained in quotes. The quotes can be either single or double quotes. For example, the attribute version=”1.0” in the opening XML declaration could be written version=’1.0’ and would be interpreted the same way by the XML parser. However, if the attribute value contains quotes, it is necessary to use the other style of quotation marks to indicate the value. For example, if there was an attribute name with a value of John “Q.” Public then it would need to be marked up in XML as name=‘John “Q” Public’, using the symbols for quotes to enclose the attribute value that is not being used in the value itself.

XML anatomy Review

There are some rules regarding the order of opening and closing elements, but that will be covered later in the tutorial. For now, let's try creating a simple XML document.

next section >

 

Watch the video
screenshot
Choose format/speed:

real media dial-up | broadband
real media dial-up | broadband

Entire tutorial (with captions)
real
media dial-up | broadband
windows media dial-up | broadband

Flash version of tutorial
segment | entire

html transcript

© 2004 Jacob Cleary | iSchool | UT Austin | webmaster