dtd2html is a
Perl
program that generates an
HTML
document that documents an
SGML
document type definition (DTD)
and allows hypertext navigation of an SGML DTD.
Contents:
dtd2html generates various HTML files for hypertext
navigation of an SGML DTD. The files generated are as follows:
DTD-HOME.htmlFile is the home page of the HTML document. This file
contains the basic links to start navigating through the
DTD. The name of this file can be changed with the
-homefile
option. User text may be added to this page via the
Description File.
TOP-ELEM.htmlThis file lists the top-most elements of the DTD, and contains
the links to element pages describing each top-most element.
The name of this file can be changed with the
-topfile
option.
ALL-ELEM.htmlThis file contains a list of all elements defined in the DTD.
This page allows quick access to any individual element
description page. The name of this file can be changed with the
-allfile
option.
ENTS.html (Optional)File contains a list of general entities defined in the DTD.
This file is only generated if the
-ents
option is specified during program invocation.
The name of this file can be changed with the
-entfile option.
DTD-TREE.html (Optional)File contains the content heierachy tree(s) of
the top-most element(s) in the DTD. This file is only generated
if the
-tree
option is specified during program invocation. The
name of this file can be changed with the
-treefile
option.
.htmlFor each element defined in the DTD, an element description file is generated with a filename of the element name suffixed by ".html". User text may be added to this page via the Description File.
.attr.htmlFor each element defined in the DTD, a file is generated describing the attributes defined for the element. User text may be added to this page via the Description File.
.cont.htmlFor each element defined in the DTD, a file is generated listing the content model decleration of the element as declared in the DTD.
Once all the files are generated, one needs only to create a link in the Web server being used to the DTD-HOME page.
More information on the content of each file is in the HTML File Descriptions section.
dtd2html is invoked from a command-line shell, with the
following syntax:
% dtd2html [options] filename
filename is the SGML DTD to be parsed for generating the HTML files. The following is the list of options available:
-allfile filenameSet the filename for file listing all elements in the DTD to
filename. The default name is "ALL-ELEM.html".
-catalog filenameUse filename as the file for mapping public
identifiers and external entities to system files. If
-catalog is not specified, "catalog" is
used as the default filename.
See
Resolving External Entities for more
information.
-contnosortThe base content list of the element.html page is listed as declared in the content model declaration. Normally, the elements are listed in sorted order and with no group delimiters, group connectors, or occurance indicators.
-descfile filenameUse filename as the source for element descriptions in the DTD. If this argument is not specified, no description file is used. See Description File for more information.
-docurl URLUse URL for location of documentation on
dtd2html. The default
URL is "file:/usr/doc/perlsgml/dtd2html.html".
-dtdname stringSet the name of the DTD to string. If not specified,
dtd2html determines the name of the DTD by its filename with the
extension stripped off. If reading from standard input, then
this argument should be specified. Otherwise, "Unknown" is
used. The string " DTD" will be appended to the name of the
DTD. If the -qref
option is specified, then the string " DTD Quick Reference"
is appended to represent the title of the quick reference document.
-elemlistGenerate a blank description file to standard output. See Description File for more information.
-entsGenerate a general entities page. The general entities types listed are: replaceable character data, CDATA, SDATA, and PI (processing instruction). Note: For large DTDs, this list may be quite large and provide little usefulness to the document.
-entsfile filenameSet the filename for the general entities page to
filename. The default name is "ENTS.html".
-entslistGenerate a blank description file
to standard output containing ONLY general entity
entries. This differs from
-elemlist
is that
-elemlist
outputs ONLY entries for elements and attributes.
See
Description File
for more information.
-helpPrint out a terse description of all options available. No HTML files are generated and all other options are ignored when this option is specified.
-homefile filenameSet the filename for the HTML home page for the DTD to
filename. The default name is "DTD-HOME.html".
-keepoldThis option is only valid if
-updateel is specified. This
option tells dtd2html to preserve any old descriptions when
updating an description file.
-level #Set the prune level of the content hierachy tree to
#. This option is only valid if
-tree is specified.
-modelwidth #Set the maximum output width for content model declarations to
# for element.cont.html pages.
Default value is 65.
-nodocurlDo not insert hyperlink to dtd2html documentation in the
DTD-HOME page.
-noreportThis option is only valid if
-updateel is specified. This
options tells dtd2html to not output a report when updating an
description file.
-outdir pathSet destination of generated HTML files to path. Defaults to the current working directory.
-qrefOutput a quick reference document of the DTD. The document is
outputted to standard output (STDOUT). When this option is
specified, only the quick reference document is generated.
Therefore, the tree page and the
-outdir
options are ignored. See
Quick Reference Mode
for more information on the -qref option.
-qrefdlOutput a quick reference document of the DTD using the <DL>,
definition list, HTML tag. When this option is specified,
only the quick reference document is generated. Therefore, the
tree page and the
-outdir
options are ignored. See
Quick Reference Mode
for more information. This option overrides the
behavior of the
-qref
option.
-qrefhtag htagUse htag as the header tag for the element names when the
-qref
option is specified. Defaults to '<H2>'.
-reportonly
This option is only valid if
-updateel
is specified. This
options tells dtd2html to generate only a report when the
-updateel
option is specified.
-topfile filenameSet the filename for file listing the top-most elements in the
DTD to filename. The default name is
"TOP-ELEM.html".
-tree
Generate the content hierarchy of the top-most elements defined in the DTD.
-treelink
Create anchor in HTML pages to the tree page, even if
-tree
is not specified.
-treefile filenameSet the filename for file containing the content hierarchy
tree(s) of the DTD to filename. The default name is
"DTD-TREE.html". This option is only valid if
-tree
is specified.
-treeonly
Create only the tree page. This option implies
-tree.
-treetop stringSet the top-most elements to string. String is a comma
separated list of elements that dtd2html should treat as the
top-most elements when printing the content hierarchy tree(s),
and/or which elements get listed in the TOP-ELEM page.
Normally, dtd2html will compute what are the top-most elements
of the DTD. This option overrides that computation.
-updateel filePerform an update of the description file specified by file. This option allows one to update an element description to contain any new elements/attributes that have been added to the DTD without affecting element descriptions already defined. See Updating Description File for more information.
-verbose
Print status messages to standard error on what dtd2html is
doing. This
option generates much output, and is used mainly for debugging
purposes.
All HTML files/pages generated contain hypertext links at the end of the page to the DTD-HOME, TOP-ELEM, ALL-ELEM, ENTS (optional), and DTD-TREE (optional) pages, unless stated otherwise.
This page is the root of the HTML document. It contains the links to the other main pages as described above.
One can add documentation to the home page via the Description File or by manually editting the file.
This page contains the list of all top-most elements defined in the DTD. A top-most element is defined as: An element which cannot be contained by another element or can be only contained by itself.
This page contains an alphabetic list of all elements defined in the DTD.
This page contains an alphabetic list of of general entities defined in the DTD. The general entities types listed are: replaceable character data, CDATA, SDATA, and PI (processing instruction). Note: For large DTDs, this list may be quite large and provide little usefulness to the document. Also, entities are not handled when updating a description file.
This page contains the content hierarchy tree(s) of the top-most
elements of the DTD. The maximum depth of the tree can be set
via the
-level
command-line option.
The tree shows the overall content hierarchy for an element.
Content hierarchies of descendents will also be shown. Elements that
exist at a higher (or equal) level, or if the maximum depth has been
reached, are pruned. The string "..." is appended to an
element if it has been pruned due to pre-existance at a higher (or
equal) level. The content of the pruned element can be determined
by searching for the complete tree of the element (ie. elements w/o
"..."). Elements pruned because maximum depth has been
reached will not have "..." appended.
Example:
|__section+)
|_(effect?, ...
|__title, ...
|__toc?, ...
|__epc-fig*,
| |_(effect?, ...
| |__figure,
| | |_(effect?, ...
| | |__title, ...
| | |__graphic+, ...
| | |__assoc-text?)
Pruning must be done to avoid a combinatorical explosion. It is common for DTD's to define content hierarchies of infinite depth. Even with a predefined maximum depth, the generated tree can become very large.
Since the tree outputed is static, the inclusion and exclusion sets
of elements are treated specially. Inclusion and exclusion elements
inherited from ancestors are not propagated down to determine
what elements are printed, but special markup is presented at a
given element if there exists inclusion and exclusion elements from
ancestors. The reason inclusions and exclusions are not propagated down
is because of the pruning done. Since an element may occur in multiple
contexts -- and have different ancestoral inclusions and exclusions in
effect -- an element without "..." may be the only place
of reference to see the content hierarchy of the element.
Example:
D1
| {+} idx needbegin needend newline
|
|_(head,
| | {A+} idx needbegin needend newline
| | {-} needbegin needend
| |
| |_(((#PCDATA |
| |____((acro |
| | | {A+} idx needbegin needend newline
| | | {A-} needbegin needend
| | |
| | |_(((#PCDATA |
| | |____((super | ...
| | |______sub)))*)) ...
Ignoring the lines starting with {}'s, one gets the content
hierachy of an element as defined by the DTD without concern of where
it may occur in the overall structure. The {} lines give additional
information regarding the element with respect to its existance
within a specific context. For example, when an ACRO
element occurs within D1,HEAD -- along with its normal
content -- it can contain IDX and NEWLINE
elements due to inclusions from ancestors. However, it cannot contain
NEEDBEGIN and NEEDEND regardless of its
defined content since an ancestor(s) excludes them.
NEEDBEGIN,
NEEDEND are excluded from ACRO.Explanation of {}'s keys:
{+}{+} appended
to the subelement entry.
{A+}{-}{-} appended to the subelement
listing.
{A-}The element page describes the content of element. The element page is divided into the following sections:
The element.attr page describes the attributes of element. The element.attr page is divided into the following sections:
This page is not created if no attributes are defined for element.
The element.cont page gives the element's content model decleration as defined in the DTD. The element.cont page is divided into the following sections:
The content models are reformatted to allow better readability.
The maximum width to use when reformating is set by the
-modelwidth
option. Each element listed in the content model is a hyperlink
to that element's page.
Here's an example of how
dtd2html
formats content model declarations:
(((#PCDATA|
((acro|book|emph|location|not|parm|term|var))|
((super|sub))|
((link|xref))|
((computer|cursor|display|keycap|softkey|user))|
((footnote|ineqn|ingraphic|fillin))|
((nobreak)))*))
This page is not created if element is defined with empty content.
dtd2html supports the ability to add documentation
to the HTML files
generated from a DTD through the
-descfile
option. Documentation can
be added to the
element pages,
the
attribute pages,
and/or
ents page.
The basic syntax of the description file is as follows:
<?DTD2HTML identifier>
<P>
Description of identifier here.
</P>
<?DTD2HTML identifier>
<P>
Description of identifier here.
</P>
...
The line <?DTD2HTML identifier>
signifies the beginning of a description entry for identifier.
All text up to the next
<?DTD2HTML ...>
instruction or end-of-file is used as the identifier description.
The identifier can be one of the following formats:
An element name in the DTD. The following description text will go at the top of the element's page.
*An element in the DTD followed by a `*'. The following
description text will go at the top of the element's attribute
page.
*attributeAn element in the DTD followed by a `*' which is followed by an attribute name of the element. The following description text will go below the attribute heading of the element's attribute page.
+An element in the DTD followed by a '+'. The following
description text goes after each elements listed in
ALL-ELEM
and in
element pages.
Due to the context that
the description text will appear (ie. inside a <LI> element),
it is best to keep the description to a single sentence.
*attributeA `*' followed by an attribute name.
The following description
text will go to any attribute named attribute, unless a
specific description is given to the attribute via an
element*attribute.
This identifier allows to add descriptions
to commonly shared attributes in one locale.
&A general entity followed by a '&'.
The following description text will go after each entity listed in
the ENTS page.
Due to the context that
the description text will appear (ie. inside a <LI> element),
it is best to keep the description to a single sentence.
,identifier,...
A sequence of identifiers separated by commas, `,'. This allows a description to be shared among muliple identifiers. Note: there should be NO whitespace between the identifiers and the commas.
If the special element, -HOME-, is specified in the
description file, then its description text will be put on the
DTD-HOME
page.
dtd2html provides special instructions that may be
used in a description file to control how dtd2html
processes the file.
Special instructions follow a similiar syntax as descriptive instructions:
<?DTD2HTML #instruction argument>
The following special instructions are defined:
#include argumentThe include directive tells dtd2html
to treat the argument as a filename to read that contains
description entries. Example:
<?DTD2HTML #include ents.dsc>
The example instructs dtd2html to open a file called
ents.dsc and read it for description entries.
SGML comments are also supported in the description file. Comments are
skipped by dtd2html. The syntax for a comment is the following:
<!-- This is a comment -->
dtd2html can only handle a comment that
spans a single line (to
make the parsing simple). Therefore, the following will cause
dtd2html to add the comment text beyond the first line of the
comment to an indentifier's description:
<!-- This is a comment
that spans more than one line.
-->
If you want to put line breaks in the description file without them
being applied to an indentifier's description, then use the SGML short
comment: <!>.
<!-- Include external descriptions --> <!> <?DTD2HTML #include ents.dsc> <!> <!-- A short description --> <!> <?DTD2HTML a+ > Anchor; source and/or destination of a link <!> <!-- A shared description --> <!> <?DTD2HTML h1,h2,h3,h4,h5,h6 > <p> The six heading elements, <H1> through <H6>, denote section headings. Although the order and occurrence of headings is not constrained by the HTML DTD, documents should not skip levels (for example, from H1 to H3), as converting such documents to other representations is often problematic. </p> <!> <!-- Element and attribute descriptions --> <!> <?DTD2HTML a > <p> The <A> element indicates a hyperlink anchor. At least one of the NAME and HREF attributes should be present. </p> <?DTD2HTML a* > <?DTD2HTML a*href > <p> Gives the URI of the head anchor of a hyperlink. </p> <?DTD2HTML a*methods > <p> Specifies methods to be used in accessing the destination, as a whitespace-separated list of names. The set of applicable names is a function of the scheme of the URI in the HREF attribute. For similar reasons as for the <a href="title.html">TITLE</a> attribute, it may be useful to include the information in advance in the link. For example, the HTML user agent may chose a different rendering as a function of the methods allowed; for example, something that is searchable may get a different icon. </p>
dtd2html ignores element descriptions that
are empty or contain only the <P> tag.
If duplicate descriptions exist, the first one defined is used (In versions prior to 1.3.0, it was the last description defined that was used).
To get started with a description file for a DTD, you can use the
-elemlist
option to
dtd2html
to generate a file with all
elements and attributes defined in the DTD with empty descriptions.
To get a list of general entities, you can use the
-entslist
option to
dtd2html
to generate a file with
general entities defined in the DTD with empty descriptions.
dtd2html supports the ability to generate a quick
reference document
of a DTD with the
-qref
option. The document generated is sent to
standard output (STDOUT). Therefore, one should redirect STDOUT to a
file. Example:
% dtd2html -qref html.dtd > htmlqref.html
No other output/files are generated while in quick reference mode.
The format of the quick reference document is as follows:
The title is determined by the
-dtdname
option (or the filename of
the DTD if the option is not specified).
Each element is listed in an <H2> tag (or the tag
specified by the
-qrefhtag
option) wrapped with the '<>' characters.
Any element description text follows the element heading if defined in a description file.
All elements are listed in alphabetical order.
Each element in the <H2> tag is wrapped with the <A NAME="element"> tag so one may cross-reference the element if desired. Example:
<H2><A NAME="body"><body></A></H2>.
An alternative format for the quick reference document may be
generated with the
-qrefdl
command-line option. The format of the
document shares the same properties as those of the
-qref
option, with
the following exceptions:
Each element is still wrapped in a <A NAME> statement to allow cross-referencing.
Keep element descriptions as brief as possible. The quick
reference document may get quite large for large DTDs. Care must
also be given if using the
-qrefdl
option; less HTML markup is
available while in a <DL>.
Keep a separate description file just for the quick
reference. Usually, the description file used in the
normal dtd2html output would be inappropriate for a quick
reference.
The -HOME- element description identifier may
be used to place
text before the list of elements. One could add a link to the
DTD-HOME page that is generated by dtd2html when the
-qref
option is not used.
As a DTD changes, one can automatically update the element description
file for the DTD to reflect the changes via the
-updateel
command line
option. The new updated description file is sent to standard
output (STDOUT). Therefore, one should redirect STDOUT to a file.
Example:
% dtd2html -updateel html.desc html.dtd > html-new.desc
When updating a description file, a report is prepended to the new description file. The report is contained in SGML comment declaration statements. Here's an example of what the report looks like:
<!-- Element Description File Update --> <!-- Source File: sgm/html.desc --> <!-- Source DTD: sgm/html.2.0/html.dtd --> <!-- Deleting Old? Yes --> <!-- Date: Mon Jun 27 00:25:41 EDT 1994 --> <!-- New identifiers: --> <!-- br, dl*, dl*compact, form, form*, form*action, form*enctype, --> <!-- form*method, img*ismap, input, input*, input*align, --> <!-- input*checked, input*maxlength, input*name, input*size, --> <!-- input*src, input*type, input*value, option, option*, --> <!-- option*selected, option*value, select, select*, --> <!-- select*multiple, select*name, select*size, strike, textarea, --> <!-- textarea*, textarea*cols, textarea*name, textarea*rows --> <!-- Old identifiers: --> <!-- dir*, dir*compact, key, link*name, menu*, menu*compact, ol*, --> <!-- ol*compact, u, ul*, ul*compact --> <!-- -->
Entity descriptions are NOT checked, and are excluded from the output. Only elements and attributes are processed.
If the description file processed contains "#include" instructions, these instructions are not preserved in the output. The output is a merging of all description entries processed.
If "#include" instruction are used, it may be best to use the
-reportonly option. Therefore,
you can determine what has changed and update the description file(s)
manually.
The report will specify any new identifiers that were created, and any old identifier no longer applicable to the DTD.
By default, any old identifiers are removed in the new element
description file. This can be overriden by the
-keepold
option.
The report will state if old identifiers are deleted or not.
ALL non-deleted identifiers keep all the description text specified in the source (original) description file.
If you desire no report, use the
-noreport
option.
If all you desire is to see what changes exist without creating a
new description file, then use the
-reportonly
option.
This option will only cause the report to be generated. This may
be used to help keep track of changes in a DTD.
Any user entered comments in the source element description file are lost in the update.
Defining the mapping between external entities to system files
may be done via the -catalog
command-line option. The catalog provides you with the
capability of mapping public identifiers to system identifiers
(files) or to map entity names to system identifiers.
Catalog Syntax
The syntax of a catalog is a subset of SGML catalogs (as defined in SGML Open Draft Technical Resolution 9401:1994).
A catalog contains a sequence of the following types of entries:
PUBLIC public_id system_idThis maps public_id to system_id.
ENTITY name system_idThis maps a general entity whose name is name to system_id.
ENTITY %name system_idThis maps a parameter entity whose name is name to system_id.
Syntax Notes
A system_id string cannot contain any spaces. The system_id is treated as pathname of file.
Any line in a catalog file that does not follow the previously mentioned entries is ignored.
In case of duplicate entries, the first entry defined is used.
Example catalog file:
-- ISO public identifiers --
PUBLIC "ISO 8879-1986//ENTITIES General Technical//EN" iso-tech.ent
PUBLIC "ISO 8879-1986//ENTITIES Publishing//EN" iso-pub.ent
PUBLIC "ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN" iso-num.ent
PUBLIC "ISO 8879-1986//ENTITIES Greek Letters//EN" iso-grk1.ent
PUBLIC "ISO 8879-1986//ENTITIES Diacritical Marks//EN" iso-dia.ent
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN" iso-lat1.ent
PUBLIC "ISO 8879-1986//ENTITIES Greek Symbols//EN" iso-grk3.ent
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 2//EN" ISOlat2
PUBLIC "ISO 8879-1986//ENTITIES Added Math Symbols: Ordinary//EN" ISOamso
-- HTML public identifiers and entities --
PUBLIC "-//IETF//DTD HTML//EN" html.dtd
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML" ISOlat1.ent
ENTITY "%html-0" html-0.dtd
ENTITY "%html-1" html-1.dtd
Environment Variables
The following envariables (ie. environment variables) are supported:
This is a colon (semi-colon for MSDOS users) separated list of paths for finding catalog files or system identifiers. For example, if a system identifier is not an absolute pathname, then the paths listed in P_SGML_PATH are used to find the file.
This envariable is a colon (semi-colon for MSDOS users) separated list of catalog files to read. If a file in the list is not an absolute path, then file is searched in the paths listed in the P_SGML_PATH and SGML_SEARCH_PATH.
This is a colon (semi-colon for MSDOS users) separated list of paths for finding catalog files or system identifiers. This envariable serves the same function as P_SGML_PATH. If both are defined, paths listed in P_SGML_PATH are searched first before any paths in SGML_SEARCH_PATH.
The use of P_SGML_PATH is for compatibility with earlier versions.
SGML_CATALOG_FILES and SGML_SEARCH_PATH
are supported for compatibility with James Clark's nsgmls(1).
The file specified by
-catalog
is read first before any files specified by SGML_CATALOG_FILES.
This program is part of the perlSGML package; see <URL:file:/usr/doc/perlsgml/perlSGML.html>