This is a legacy document, and retained on the site in order to avoid link rot. The content is likely no longer (a) accurate, (b) representative of the views and philosophies of current site management, or (c) up to date.

A simple demo of entities

Table of contents

Introduction

A usenet news thread that started in the opera.tech news group with…

 Message-ID: <1103_1010144727@news.opera.no>
 

…came to take on a rather long life and at one point a poster asked for information on how to declare and expand entities in an HTML document instance.

Surely this can be done using standard principles defined by SGML, with the caveat that no, to me known, of the more popular www browsers supports this method.

Never the less; a small demonstration can still be designed to illustrate the method, and thanks to the set of on-line SGML tools that Nick Kew makes available on his Code Valet pages, we can even get a view of how an SGML parser handles declared and referenced entities.

Source documents

First examine the source of these two documents…


 /markup/entity-demo.html
 
 /css/entity-sample.txt
 

The first document is a very simple HTML document with a SYSTEM identifier for the external HTML declaration subset, and two entity declarations given in the internal subset.

The first of these entity declarations just defines a string of text as its expansion value while the other entity declaration uses a SYSTEM identifier to point to its expansion value as being available from a system location as identified by the stated URL.

We will now make use of these two sample documents to run two different demonstrations of how an SGML parser takes care of entity declarations and entity references.

Demonstration 1 -- nsgmls

For this first demo we will send our demo document instance to a validating SGML parser and let it generate its standard output for us to study.

For the rest of this demo sequence it may be practical to open up a second browser window where the actual tests can be shown separate while the following procedure is still kept in view.

Step by step procedure

The output generated by nsgmls will give no parser messages, indicating that a valid document instance was found. After that there is the parser output in ESIS format, which is the standard output format for nsgmls.

The ESIS format is pretty simple to read once one gets the hang of how it is designed. Basically each line is an output record where the first character on each line is a characterization of what the rest of the line contains.

A few examples

Note that nsgmls has expanded the original entity declarations at those points where they where referenced. I.e. it has included info from both internal and external sources into the data stream that is supposed to be sent on to a next level application for further processing.

Further processing could be handled by e.g. a rendering engine that optionally applies a style sheet on the data for formatting in a browser window.

Demonstration 2 -- SGMLNORM

For this second demonstration we will make use of another tool that comes with the 'SP' tool set as originally created by James Clark , this would be SGMLNORM .

SGMLNORM is a program module that combines syntactic validation and a normalization phase, to produce as its output a new fully marked up document instance where e.g. all entity references has been resolved and all left out optional tags in the original instance has been inferred to their proper places and inserted where they should have been from the start.

Remember that most popular browsers in use today really do love to see fully marked up documents, especially if we plan to create some CSS rules for them to suggest a decent rendering of the document content later on.

So lets start this section of the demo.

Step by step procedure

This time the output sent back will be a "normalized" instance of the original document instance. Normalization here means among other things that any missing but optional closing tag in the original source has been inserted at its inferred place (note the missing </P> in my original doc instance). Also the entity references has been expanded into their correct places.

On top of that SGMLNORM has also made use of nsgmls , in its background processing, to validate the document instance.

The output now received from SGMLNORM is a new HTML document instance that can be cut and pasted directly up on a server, to be accessed from there by popular www browsers, without them actually knowing what they got hit by.

In effect; we have just used an available on-line service as a pre-processor that allows us to declare and use entities as a method to combine bits and pieces of content into a document instance that can later be presented on the www as a usable resource.

Epilogue

We may ask our selves why no browser vendor ever was capable to understand and implement even the simplest level of SGML's entity handling in their browsers? Instead we got e.g. CENTER, BLINK and FRAMES, plus a lot of other stupidity.

As of today it's probably to late to do anything about it. Some jerk(s) (I don't have any better name for her/him/those, who ever it was) came up with a totally invalid use of the <!DOCTYPE… declaration as in DOCTYPE sniffing for switching rendering modes in browsers, and even managed to sell the idea to other jerks too.

Gosh, how RTFM challenged can some people get?

Well, I hope you all have got at least a glimpse of what we could have had, basically only fantasy limits what could be done with entities.

Say! why?

 
 /css/entity-sample.txt
 
 
 

Why not?

 
 /css/entity-sample.cgi
 
 
 

Server side created entities in different languages based on values in HTTP accept request headers, or some other criteria. Think of it!

This method does not even need client side scripting, it could have been made to work in Lynx even :-)

Jan Roland Eriksson