This blog is subject the DISCLAIMER below.

Tuesday, June 09, 2009

Introduction to XML


SGML: (Standard Generalized Markup Language) is Standard in which one can define markup languages for documents.

HTML : Hypertext Mark-up Language.

XML : Extensible Markup Language, is a markup language that you can use to create your own tags.

XML is created to overcome the limitations of the HTML. Although HTML is a very successful markup language, it is used to preview the data without understanding the data or even give the ability to analyze data.

So main advantages of XML is giving the ability to analyze data and search inside XML document. Also XML used for data interchange, organizations can exchange data in XML and then convert this data to database records easily.

XML document rules

1-Root element:

An XML document must be contained in a single element. That single element is called the root element, and it contains all the text and any other elements in the document.

2- Elements cannot overlap:

If you started element and inside this element you must close first then close

3- End tags are required

Each element must have an end tag.

4- Elements are case sensitive

5-Attributes must have quoted values

• Attributes must have values.

• Those values must be enclosed within quotation marks.

You can use a predefined structure using the document type definition (DTD).

DTD defines the elements the can appear in XML file and the order if the elements. Another approach for using predefined structures is XML schemas.

XML Programming Interfaces

This section focus on the programming interfaces to deal with XML document.

There is a lot of programming APIs Available. Here we have the most popular APIs; Document Object Model (DOM), the Simple API for XML (SAX), JDOM, and the Java API for XML Parsing (JAXP).

  • Document Object Model (DOM):

Defines a set of interfaces to the parsed version of an XML document. The parser reads in the entire document and builds an in-memory tree, so your code can then use the DOM interfaces to manipulate the tree. You can move through the tree to see what the original document contained, you can delete sections of the tree, you can rearrange the tree, add new branches, and so on.

DOM has some issues, building the whole XML document in the memory consumes time especially with large documents. What if I need a specific part from document? It doesn't make sense to load the entire document.

  • Simple API for XML (SAX):

SAX handle a lot of DOM issues, SAX based on events. First you define which event is more important to you and the data type of the data from event, the parser goes throw the document and throw event at the start, end of the element or start , end of document. If you don`t save the data from the event it will be discarded. As you can see SAX doesn`t hold the entire document in the memory, so it saves time. But one of the SAX issues is that SAX is stateless.

  • JDOM

Java classes developed to make it easier to use DOM and SAX parser. JDOM handle the DOM and SAX interfaces and gives high level classes to reduce the amount of code. JDOM make most of the parsing functionalities.

  • Java API for XML Parsing (JAXP).

Although DOM, SAX, and JDOM provide standard interfaces for most common tasks, there are still several things they don't address. For example, the process of creating a DOMParser object in a Java program differs from one DOM parser to the next. To fix this problem, Sun has released JAXP, the Java API for XML Parsing. This API provides common interfaces for processing XML documents using DOM, SAX, and XSLT. JAXP provides interfaces such as the DocumentBuilderFactory and the DocumentBuilder that provide a standard interface to different parsers. There are also methods that allow you to control whether the underlying parser is namespace-aware and whether it uses a DTD or schema to validate the XML document.

1 comment:

Anonymous said...

Another and better way to process XML is called vtd-xml

http://vtd-xml.sf.net