back to the Eurescom home page

 

mess@ge home

Table of contents
of the current issue
 

Selected Highlights
The advance of
 XML Web
 Services
XML Web
Services and
telcos
Web Service Orchestration
Security
mechanism for
Web Services

Open document standard

Anastasius_Gavras

Anastasius Gavras
Eurescom
gavras@eurescom.de

One of the earliest applications for computers was simple document editing, storage and printing. Soon the capability to easily create documents, process them and manage the multitude of different forms a document could have, grew into a nightmare for any desktop application user. The information technology industry continues to offer a large number of different solutions and formats to capture information. Is there light at the end of the tunnel and hope for a more standardised way to capture, process and store information?

A document is something that can be used to supply evidence or information. In other words, a document is a writing that contains information. That is probably why we call computers also information systems in the wider sense. And the flexibility and programmability of computers opened the door to the numerous different ways to process information and thus capture it in documents.

A bit of history

As early as in the late 1960’s information systems’ specialists started to introduce control codes into electronic manuscripts that caused a document to be formatted in a particular way. In 1969, IBM started an integrated law office information system, which led to the invention of the Generalized Markup Language (GML) as a means of allowing the text editing, formatting, and information retrieval subsystems to share documents. In the 1980’s and after a rather long standardisation process the Standard Generalized Markup Language (SGML) was published by ISO in 1986 (ISO 8879:1986).

SGML is a method for creating interchangeable, structured documents allowing to assemble a single document from many sources (such as word processor files, database queries, graphics, video clips, etc.) and define a document structure using a special grammar called a Document Type Definition (DTD). Furthermore it allowed adding markup to show the structural units in a document and validating that the document follows the structure that is defined in the DTD.

Although SGML was the method of choice for large documents and document assemblies in the scientific world it never really found wide acceptance in the mass market. This market was soon dominated by software companies that offered proprietary but easy to use solutions like Microsoft Office applications and Adobe PDF (Portable Document Format). 

The role of the World Wide Web

The Hyper Text Markup Language (HTML), a simple and static derivative of SGML, can be seen as one of the main enablers of the Internet in the early 1990’s. It was soon recognised that the static nature of HTML was by far not enough to describe documents and their structure that had to be presented on paper or on screens of different sizes and resolutions. Although conceived in the 1970’s, SGML was good enough to give birth to a subset called the Extensible Markup Language (XML), published as a W3C Recommendation in 1998.

Both SGML and XML are ‘meta’ markup languages, i.e. languages by which one can define a concrete markup language. In the case of electronic documents, this is a language for the definition of the structure of documents. 

Requirements

One can very easily formulate requirements for an open electronic document standard for simple text documents:

  • Suitable format for transmission over the Internet

  • Capabilities for automatic archiving of documents

  • Long-term readability of the format

  • Capabilities for easy information retrieval

  • Capabilities for document integrity and traceability of access

  • Capabilities to observe different security and privacy policies

  • Independence of the text-editing platform

  • Safe format (i.e. not offering a platform of its own for the spread of viruses)

Taking into account specific requirements of certain domains, like government, electronic business, justice, the list of requirements grows out of scope of this tutorial.

With XML the way to an open standard for widely applicable and acceptable electronic document exchange seems for the first time a reachable target. Obviously there is a high demand for such a standard as can be deduced by the increasing number of electronic business transactions between:

  • private individuals

  • citizens and government

  • administrations

  • commercial enterprises and

  • commercial enterprises and their clients

  • and essentially any type of legal entity

Applications of XML

Since XML was published by the W3C in 1998 a very large number of initiatives took the opportunity to standardise on the way different domains which manage information. In most cases in the past, the domain specific knowledge and information and its representation required a proprietary way of managing this information. This tutorial cannot possibly list all initiatives. Only some applications and initiatives are briefly introduced below.

In computer science and technology the DocBook appears to be a widespread and well-accepted DTD and is particularly well suited for books and technical papers in this domain. Currently the DocBook DTD is maintained by the DocBook Technical Committee of the OASIS consortium.

The Open Office XML Format technical committee was established with the purpose to create an open, XML-based file format specification for office applications meeting requirements suitable for office documents containing text, spreadsheets, charts, and graphical documents.

The UK’s e-Government Interoperability Framework (e-GIF) sets out the government’s technical policies and standards for achieving interoperability and information systems coherence across the public sector. One of the key policy decisions in the UK ‘e-GIF’ program is identified as the adoption of XML as the primary standard for data integration and presentation on all public sector systems.

VoiceXML (Voice eXtensible Markup Language) is a standard for making Internet content and information accessible via voice and phone. VoiceXML is an example of non text-based information that can be structured with XML. Voice applications that benefit from VoiceXML are automated speech recognition systems, text-to-speech applications, and others.

The OASIS consortium, a global not-for-profit organisation, hosts on its Web site a very large list of applications and initiatives based on XML (http://xml.coverpages.org/xmlApplications.html).  

Conclusion

Information technology is about to give the information society the right capabilities for a free and unimpeded exchange of electronic documents. The technology vehicle to provide these capabilities is the Extensible Markup Language (XML), which originates in concepts of the late 1960’s. It is one more example of the value of standardisation for the information society.

You can find more information on the OASIS consortium at www.oasis-open.org

Please send us your comments on this article.