How XML and RDF are updating HTML
By standardising methods of communication (in this case the HTTP and TCP/IP protocols), the Internet has connected different cultures, using different technologies, in geographically dispersed locations.
The next stage in the evolution is to standardise not only the way in which the resources are shared, but also the underlying structure of the resources. The information contained within can then be ‘electronically’ read – and understood – rather than having to rely on the human-eye to digest and comprehend.
HTML
Currently, the majority of documents on the Internet are constructed using HTML – the Hyper-Text Mark-up Language. However, HTML provides a very crude means for document storage:
- Common components cannot be shared across documents; instead they must be included within every file (e.g. navigation, logos, copyright information, etc.). This can lead to inconsistent branding and difficult updates (e.g. if the navigation was to change, every page would need editing).
- HTML, as a language, specifies the format of the information, not the meaning. As a result:
- Accurate, relevant searches – based on semantic meanings - are almost impossible to build.
- Computer controlled content exchange is not possible. If a partner site wished to automatically query an HTML site for its latest list of services or products, then no meaningful response could be given.
- Re-formatting the collection of documents (for example to appear on digital TV, or a WAP phone, or in a PDF brochure) is extremely difficult, as the formatting has been hard-coded, rather than applied in a separate process.
XML
Extensible Mark-up Language (XML ) is the obvious next stage for the Internet. By describing the information that the document contains, using metadata, the aforementioned problems can be overcome, and electronic syndication can be conducted at greater speeds and with greater effectiveness. XML is now becoming widely established in the content-management environment.
XML documents are plain text format – making them, by default, platform-independent. Due to their structured, self-descriptive format, they are also easily converted to other well-structured languages. Examples of these are: HTML – the standard web-document language, PDF, and WML – the WAP (Wireless Application Protocol; for mobile phones) equivalent of HTML. This enhances the potential reach of an XML based system.
RDF
The rapid growth of the World Wide Web, over the past ten years, has made an enormous amount of information available. This awards users of the Internet with a seemingly endless library to source and dissect resources.
In practice, however, the sheer volume of information - combined with little means for cataloguing or classification - creates barriers between the user and any relevant content that may be available.
To overcome these barriers, a standard means for accurately describing and mapping out resources has been introduced. Through the standard syntax of XML, an RDF (Resource Description Framework) description can accurately catalogue web-based resources. Through the exchange and dissemination of these descriptions, advanced Internet applications can source content with much greater accuracy and efficiency.
