Welcome to this tutorial on XML and DTDs. This tutorial assumes that you have been introduced to the possibilities of XML and want to learn more about the nuts and bolts of creating an XML document or a DTD. If your unsure of what exactly XML is, we encourage you to look over the Introduction to XML Tutorial. This tutorial aims to show you a few things. First off we want to show how both create a well formed XML document and a DTD that validates that XML. We'll also attempt to explain what well formed and valid mean when talking about XML and describe the anatomy and structure of XML. We'll also take you through why you would want to create a DTD, the steps of creating a DTD and some examples of DTDs currently used in the real world. There are a few definitions when talking about XML that would be helpful. First of all, XML stands for eXtensible Markup Language, a standard which was created by the W3 Group for marking up data. A DTD is a Document Type Definiton, a set of rules defining relationships within a document and this can both be either internal within the document or linked externally through a hyper reference. Finally, there is an XML parser which is software that reads XML code and documents which then interprets or parses this code according to XML standard. A parser is needed to perform actions on XML. For example, if you wanted to compare an XML document to a DTD you would need a parser. Let's talk about the anatomy of XML next. If you ever done some HTML coding, creating an XML document will seem very familiar. Like HTML, XML is based on SGML, a Standard Generalized Markup Language, and designed for use with the web. If you've never done any HTML coding before after creating an XML document you should find that creating an HTML documents easy. Now let's look at an XML document. An XML document consists of two main parts, in contains the prolog and the content. The prolog will contain the XML declaration, the version of XML being used and the character encoding for the XML document. It will also be the location of the DTD if the XML has an DTD either internally, or in this case externally. The content will contain the structural metadata of the XML, which will usually consist of elements or attributes and then the content of the XML document itself. All XML documents must have the XML declaration and the version number of XML being used, the encoding is optional. XML documents also don't need a DTD but if they dont have a DTD or some other sort of Document Type Declaration then they can not be considered valid documents they can only well formed. Currently, there are two versions of XML, version 1.1 and 1.0. This tutorial focuses primarily on XML 1.0. We'll talk about document type declarations later on in this tutorial in more depth. For now lets focus on the content. As mentioned before content consists of elements, attributes and then the content. The few rules we should discuss when talking about elements. When naming elements it's important to remember that they can be any mixture of letters, numbers and symbols. However, element names are case sensitive. So elementname is different from ElementName, which is different from ELEMENTNAME. Additionally, there are 5 characters you should avoid using when making element names. < > & " : The first four here are structural markup used in XML to indicate different elements and attributes. While the bottom, the colon at the bottom is used to indicate namespaces in XML. Instead, use the following entities to replace those characters in element names. For example, instead of & use & or the greater sign use this (>), less sign use this(<), quotation use this ("). Now, element names must contain both opening and closing tags. Opening tags are indicated by the open brackets <> and the closing is indicated by having the bracket and then slash. Elements, then have character data or more elements within them. If it's character data you don't have to worry about it. If it's elements it would look something like this. The element name, content, another element's opening tag, more content, the closing of the element, and then the closing of the other element. So in this example, elementname is the parent element because anotherelement is the child because it is nested within elementname. Attributes can be added to most elements, the format for adding attribute name and then the attribute value. Let's take a look at our example here. In this case, the element as an attribute called reply. So here is the element, attribute name and then the attribute value. Notice that the attribute value is contained in quotations which is why you avoid using quotations when making element names. Sometimes XML could interpret part of that title ... Part of that XML element name as an attribute. Speaking of attributes, there are a few things we should talk about. While attributes can be added to most elements in XML there are a couple of reasons attributes are used rarely. First of all, XML parsers have a hard time checking attributes against DTDs. Second, if the information in the attribute is valuable why not contain that information in an element. And third, since certain attributes can only have predefined categories you can't go back and forth and easily add new categories to an attribute. Therefore we recommend using attributes in only a couple of circumstances. First if you want to launch a helper application through the parser, you might want to inform in XML the XML parser that a particular file is a image or PDF. and Second, if you have an attribute with predefined values like "yes" and "no" that will not be changing in the future. These are the two circumstances you might want to use attributes in. Now that you know some of the basic rules of creating an XML document let's try them out. Like most if not all standards developed by the W3 group you can create an XML document using a plain text editor like Notepad on the PC, TextEdit on the Mac, or pico in UNIX. You can also use specialized software like Dreamweaver or Cooktop. But all that is necessary to create an XML document is a text editor. For the purposes of this tutorial we'll be assuming that we're creating XML documents for both emails and letters. The first thing we would need to do is for either of these documents will be to declare XML. After declarating XML the next thing we need to do is determine what will be the root element of these documents. Since both emails and letters are considered messages, we'll have message be the root element of this document. You'll notice that I created both the opening and closing tag of message when I added these to the document. Since we have message being parent to both email and letter, in this case we will need to add the child element email to continue on in making this document. You'll notice that I indented the next element, email off of message to help make it easier to tell there is a relationship between the two and in this case the relationship is that the message is the parent element and email is the child. Now let's consider what information we would like to store from this email in this XML document. Let's say we want to have who sent and received the email, the subject of the email and the text. Let's have child elements of email be header, subject, and text with header having both recipient and sender. By creating all these elements prior to filling them in with content then we will know for sure that we did not forget to close out any of the elements.08:05 Now let's also say we wanted to add a attribute to email saying whether or not it was a reply or not to a previous email. In this case we'll say yes it's a reply. Now that we have the basic structure filled out for this XML document let's fill in the rest of the content.08:21 So now we have created an XML document that contains a simple email message. Now let's try creating one for a letter. Again, the first thing we are going to do is declare xml and then enter the root element message. Now since in this case the child element is letter instead of email we'll have letter indexed off underneath message. We'll also want to know if this letter is going to be a reply or not so we'll add the element, the attribute to the element letter, reply. Information we want to store from the letter is very similar to the email. We want to know who sent it, who received it, and the main body of the text. But we'll also in this case make some XML elements for the date it was sent on and what was the greeting or saluation of the document. Now let's go ahead and put those elements in. Notice that we added date to the letterhead and saluation to text since that would be the area where this information would generally be found in letters. Now let's fill in this document with it's content. Now we've created two simple XML documents, one an email and one a letter both of which have some attributes and elements. Let's go on and now talk about how we determine what we've created is well formed. When talking about XML, well-formed and valid are two different things. All XML can be well-formed but only some XML documents can be valid. Well formed XML when there is no syntax, spelling, punctuation, grammar errors, etc. in the markup of the XML document so that when it is read by a parser it doesn't cause the parser to fail. Valid documents are only those documents that are compared to a standard such as a DTD, XML schema or some other document definition. In this next section, we'll talk about some errors, some common errors made in XML that would cause a document to be not well formed. Let's cover the four most common errors in XML creation that would cause your markup to not be well-formed. The first common error is to forget to close an element in the document. Le'ts take a look at an XML document that has this error. In this particular XML document we forgot to close the sender tag. When we would open this document in some sort of XML parser we would receive an error message. In this case we see the error message we would receive from Internet Explorer. Different parsers will give different responses when they come across an error. In this particular case, the XML parser in Internet Explorer is saying that it was expecting an end tag for sender not header. Another error commonly made in XML is forgetting or misspelling an element name. Because element names are case-sensitive in XML this a common error. Let's take a look at an XML document that has this error. In this example we've remembered to include the closing tag for sender but we misspelled it with a capital S rather than all lowercase. Let's take a look at the error message we'd get from Internet Explorer in this particular example. In this case, the error message is very similiar to the one before. It is saying that it was expecting an end tag matching sender instead we received the end tag with Sender. Let's move on to the next common error. The next common error were are examining is forgetting to close an element's attribute. Let's take a look at an example of this particular mistake. In this example, we forgot to close the attribute reply. The value we forgot to put the closing quotation on the value yes. By forgetting this closing quotation the XML document assumes that the rest of the message here is part of the attribute value of reply. Let's take a look at the error message we might receive from Internet Explorer for this particular mistake. Because the quotation mark wasn't closed when the XML parser read this document it found that an invalid character for an attribute value, in this case the closing bracket ...er... the opening bracket. Let's take a look at the final possible ... final error we will be covering here for XML. The final common error we will be examining is closing elements in the wrong order. Let's take a look at an example of this particular mistake. In this particular example we've mixed the closing order for the elements sender and recipient. Instead of closing recipient first we've closed sender. Let's take a look at the error message we'd receive from Internet Explorer for this particular mistake. According to the error message, the XML parser was expecting the end tag recipient rather than sender. Since XML requires the most recently opened element to be closed first a rule that might help you remember the closing order that elements should go in is the ABBA rule. No we are not talking about the about the Dancing Queen but rather if you open element A first and then open element B within it then you first need to close element B and then element A. Hopefully, that will help you remember to close the elements in the correct order. Hopefully, if you correct for these common errors you will have well-formed documents. Now let's talk about creating a DTD so that you can then validate your well-formed documents. Why would you want to create a DTD? The benefit of a DTD is that it allows you to create numerous documents and make sure that all the information contained in them is comparable between the XML documents. For example, you can with a DTD you can make sure that all the information about dates are in tags called date rather than time, dates, Date or DATE. By creating the XML document that meets a DTDs requirement's you can also share infomation between institutions. For a real life example of using DTDs let's take a look at the Encoded Archival Description, or EAD. The EAD was developed in conjunction with the Society of American Archivists and the Library of Congress. One of the purposes of the EAD was to create a standard for making electronic finding aids for items in archives. While the first version of EAD was created prior to XML's establishment as a standard it is based on SGML like XML so once XML was standardized, EAD DTD's were quickly created following the recommendation of XML by the W3 group. Several institutions at UT are using the EAD. Institutions like the Center for American History or the Nettie-Lee Benson Latin American Collection have created their finding aids using a EAD DTD. By providing these finding aids online these institutions allow automatic harvesters to generate an online catalog of what is available at these archives. For example, the Texas Archival Resources Online or TARO visits a number of institutions and gathers their finding aids. Following the harvesting of these finding aids at these institutions TARO then generates a catalog that a user can then search for different items possibly held at these archives. All the institutions usingTARO use the EAD DTD. Let's take a look at how to create our own DTD now that you've seen some of the uses of having a DTD. When creating your own DTD you'll need to define all the elements and attributes that you'll be using in your XML documents. In this case we'll be creating a DTD for our message XML document. Some syntax to remember when creating DTDs are the following: A comma equals the boolean operator AND A horizontal bar equals the boolean operator OR parenthenses means this item occurs only once a plus following an item says that the item must occur at least once a question mark following an item name means that it occurs either once or not and a asterick following an item name means that it can occur zero or more times. Elements are declared in the following manner. You have the element declaration, element name and then the element part(s). Then you close the element declaration. Then for declaring attributes you have the attribute declaration, the element name that the attribute is attached to, the attribute name, the type of attribute it is and then if there is any default values for the attribute and then close the attribute declaration. Let's take a look at the DTD I created for the message.xml file. Here's the message DTD we've created. Since we have main element being... the root element being message have to define that which we said has two parts either email or letter. Letter is composed of letterhead and text. Email's composed of header, subject, or text. Subject can or can not be there and you can have more than one text area. The attributes listed we have letter and email both having the attribute reply which has the possible values of yes or no and the default value is no. Then we have the element header which has the elements in it sender, recipient (which can be zero or more) and date (which can or can not be there.) We also have the element subject which just contains PC character data which is what you would use if you were going to have plain text within an element. Then letterhead has sender, recipient, and date. Sender has PC character data, recipient has PC character data, date has PC character data, and then text either has PC character data or salutations zero or more times. Then salutation would be PC character data. After creating this DTD we would save it as message.dtd. Then we need to link our XML documents to this DTD so that they can be validated against it. To link to an external dtd you include declaration doctype, the root element in the DTD, in this case message, the location SYSTEM and then in ( ) the name of the file. This could also be a hyper reference like http:// and then the url of the location of the DTD if it was hosted somewhere online. If this was the case then you would want to change SYSTEM to PUBLIC. Let's take a look what it would look like if you declared a DTD internally. The structure of declaring internally would be very similar. You would have the DOCTYPE declaration the name of the root element, you would have an open [ all the element and attribute declarations and then a closing ] and then a closing tag >. So if we wrapped that email.xml file in this internal document declaration this is what it would look like. It would have the DOCTYPE declaration, the message and then the element names and attribute lists and then the closing bracket and then the closing tag. Now that you've learned how to link a DTD to a document both internally and externally let's take a look at validating an XML document. Validating an XML document requires an XML parser. Let's use Internet Explorer to validate against our external DTD. As you can see since the XML document is linked to a local file that Internet Explorer can find the XML is validated. However, if we were to use the same document in an online validator such as the one created by Brown University's Scholarly Technology Group errors would occur. In this case the errors in validation result from the fact that the DTD file is referenced locally. Because the DTD is referenced locally when the online validator goes to look for the DTD it can't find it. Therefore we can fix this in either two ways, either we can give the DTD: referenced externally in this case message.dtd a full url like http://www.ischool.utexas.edu/technology/tutorials/webdev/xml_dtds/ message.dtd or we can change and replace the external DTD with an internal one and revalidate with the online DTD. In this case let's use the internal dtd option. As you can see by the lack or errors the validator doesn't need to look elsewhere for the DTD to validate against because we've included it internally. So in this case it able to completely check the document and determine that it is a valid and well-formed XML document. Now you know how to create an XML document that will be well-formed and then also create a DTD that will allow you to validate that XML document. Now that you have these skills go out have fun making and creating XML documents and DTDs.