SP provides a generic API in addition to its native API. The generic interface is much simpler than the native interface. It is generic in the sense that it could be easily implemented using parsers other than SP. It provides all ESIS information as well as some other information about the instance that is commonly needed by applications. However, it doesn't provide access to all information available from SP; in particular, it doesn't provide information about the DTD. It is also slightly less efficient than the native interface.
The interface uses two related abstract classes. An
SGMLApplication is an object that can handle a number of
different kinds of event which correspond to information in an SGML
document. An EventGenerator is an object that can
generate a sequence of events of the kinds handled by an
SGMLApplication. The
ParserEventGeneratorKit class makes an
EventGenerator that generates events using SP.
SGMLApplication has a number of local types that are used
in several contexts:
Char
unsigned short if
SP_MULTI_BYTE is defined and unsigned char
otherwise.
CharString
Char.
It has the following members:
const Char *ptr
Chars of the string.
size_t len
Chars in the string.
Location
OpenEntityPtr
and a Position. The CharStrings in it will
remain valid as long as the OpenEntity that is pointed to
by the OpenEntityPtr that was used to construct it
remains valid.
It has the following members:
unsigned long lineNumber
(unsigned long)-1 if invalid.
unsigned long columnNumber
(unsigned long)-1 if invalid.
unsigned long byteOffset
(unsigned long)-1 if invalid.
unsigned long entityOffset
(unsigned long)-1 if invalid.
CharString entityName
CharString filename
const void *other
When a location is in an internal entity, the location of the reference to the entity will be used instead.
OpenEntity
OpenEntity is, in conjunction with a
Position, to create a Location. An
OpenEntity is accessed using an
OpenEntityPtr.
OpenEntityPtr
OpenEntity.
Position
Position is completely determined by the
OpenEntity object with which it is associated. The only
use for an Position is, in conjunction with an
OpenEntity, to create a Location.
ExternalId
bool haveSystemId
CharString systemId
havePublicId is true.
bool havePublicId
CharString publicId
havePublicId is true.
bool haveGeneratedSystemId
CharString generatedSystemId
haveGeneratedSystemId is true.
Notation
CharString name
ExternalId externalId
Entity
CharString name
Entity::DataType dataType
Entity::DataType is a local enum with the following possible
values:
Entity::sgml
Entity::cdata
Entity::sdata
Entity::ndata
Entity::subdoc
Entity::pi
Entity::DeclType declType
Entity::DeclType is a local enum with the following possible
values:
Entity::general
Entity::parameter
Entity::doctype
Entity::linktype
bool isInternal
CharString text
isInternal is true.
ExternalId externalId
isInternal is false.
const Attribute *attributes
isInternal is false.
size_t nAttributes
isInternal is false.
Notation notation
isInternal is false.
Attribute
CharString name
Attribute::Type type
Attribute::Type is a local type with the following possible
values:
Attribute::invalid
Attribute::implied
Attribute::cdata
Attribute::tokenized
Attribute::Defaulted defaulted
Attribute::Defaulted is a local enum with the following
possible values:
Attribute::specified
Attribute::definition
Attribute::current
size_t nCdataChunks
Attribute::CdataChunks comprising the value
of the attribute. Valid only if type is
cdata.
const Attribute::CdataChunk *cdataChunks
Attribute::CdataChunks comprising the value of this attribute.
Valid only if type is cdata.
Attribute::CdataChunk is a local struct with the
following members:
bool isSdata
CharString data
CharString entityName
isSdata is true.
This is non-ESIS information.
CharString tokens
type is Attribute::tokenized.
bool isId
size_t nEntities
const Entity *entities
Notation notation
For each event xyzEvent handled by
SGMLApplication, there is a virtual function of
SGMLApplication named xyz to
handle the event, and a local struct of SGMLApplication
named XyzEvent.
Pointers within an event xyzEvent are valid
only during the call to xyz. None of the
structs in events have copy constructors or assignment operators
defined. It is up to the event handling function to make a copy of
any data that it needs to preserve after the function returns.
Except as otherwise stated, the information in events is ESIS information. All position information is non-ESIS information.
There are the following types of event:
AppinfoEvent
Position pos
bool none
CharString string
none is false.
PiEvent
Position pos
CharString data
CharString entityName
StartElementEvent
Position pos
CharString gi
Element::ContentType contentType
Element::ContentType is an enum with the following
possible values:
Element::empty
Element::cdata
Element::rcdata
Element::mixed
Element::element
bool included
size_t nAttributes
const Attribute *attributes
EndElementEvent
Position pos
CharString gi
DataEvent
Position pos
CharString data
SdataEvent
Position pos
CharString text
CharString entityName
ExternalDataEntityRefEvent
Position pos
Entity entity
SubdocEntityRefEvent
Position pos
Entity entity
StartDtdEvent
Position pos
CharString name
bool haveExternalId
ExternalId externalId
EndDtdEvent
Position pos
CharString name
EndPrologEvent
Position pos
GeneralEntityEvent
ParserEventGeneratorKit::outputGeneralEntities option is
enabled. This is non-ESIS information. The event has the following
members:
Entity entity
No event will be generated for the declaration of the
#default entity; instead an event will be generated when
an entity reference uses the #default entity if that is
the first time on which an entity with that name is used. This means
that GeneralEntityEvent can occur after the end of the
prolog.
CommentDeclEvent
ParserEventGeneratorKit::outputCommentDecls option is
enabled. This is non-ESIS information. The event has the following
members:
Position pos
size_t nComments
const CharString *comments
const CharString *seps
MarkedSectionStartEvent
ParserEventGeneratorKit::outputMarkedSections
option is enabled.
This is non-ESIS information.
The event has the following members:
Position pos
MarkedSectionStartEvent::Status status
MarkedSectionStartEvent::Status is a local enum with the
following possible values:
MarkedSectionStartEvent::include
MarkedSectionStartEvent::rcdata
MarkedSectionStartEvent::cdata
MarkedSectionStartEvent::ignore
size_t nParams
const MarkedSectionStartEvent::Param *params
Param is a local struct with the following members:
MarkedSectionStartEvent::Param::Type type
MarkedSectionStartEvent::Param::Type is a local enum with the
following possible values:
MarkedSectionStartEvent::Param::temp
MarkedSectionStartEvent::Param::include
MarkedSectionStartEvent::Param::rcdata
MarkedSectionStartEvent::Param::cdata
MarkedSectionStartEvent::Param::ignore
MarkedSectionStartEvent::Param::entityRef
CharString entityName
type is
MarkedSectionStartEvent::Param::entityRef.
MarkedSectionEndEvent
ParserEventGeneratorKit::outputMarkedSections option is
enabled. This is non-ESIS information. The event has the following
members:
Position pos
MarkedSectionEndEvent::Status status
MarkedSectionEndEvent::Status is a local enum with the
following possible values:
MarkedSectionEndEvent::include
MarkedSectionEndEvent::rcdata
MarkedSectionEndEvent::cdata
MarkedSectionEndEvent::ignore
IgnoredCharsEvent
ParserEventGeneratorKit::outputMarkedSections option is
enabled. This is non-ESIS information. The event has the following
members:
Position pos
CharString data
ErrorEvent
Position pos
ErrorEvent::Type type
ErrorEvent::Type is a local enum with the following possible
values:
ErrorEvent::quantity
ErrorEvent::idref
ErrorEvent::capacity
ErrorEvent::otherError
ErrorEvent::warning
ErrorEvent::info
CharString message
SGMLApplication also has a virtual function
void openEntityChange(const OpenEntityPtr &);
which is similar to an event. An application that wishes to makes use
of position information must maintain a variable of type
OpenEntityPtr representing the current open entity, and
must provide an implementation of the openEntityChange
function that updates this variable. It can then use the value of
this variable in conjunction with a Position to obtain a
Location; this can be relatively slow. Unlike events, an
OpenEntityPtr has copy constructors and assignment
operators defined.
The EventGenerator interface provides the following
functions:
unsigned run(SGMLApplication &app)
app for each event. Returns the number of
errors. This must not be called more than once for any
EventGeneratorobject.
EventGenerator *makeSubdocEventGenerator(const SGMLApplication::Char *s, size_t n)
EventGenerator for a subdocument of the
current document. s and n together specify the
system identifier of the subdocument entity. These should usually be
obtained from the generatedSystemId member of the
externalId member of the Entity object for
the subdocument entity. This function can only be called after
run has been called; the call to run need
not have returned, but the SGMLApplication
must have been passed events from the prolog or instance (ie the SGML
declaration must have been parsed).
void inhibitMessages(bool b)
run() is executing.
void halt()
run().
This can be at any point during the execution of run().
It is safe to call this function from a different thread from that which
called run().
The ParserEventGeneratorKit class is used to create an
EventGenerator that generate events using SP. It
provides the following members:
EventGenerator *makeEventGenerator(int nFiles, char *const *files)
EventGenerator that will generate events
for the SGML document whose document entity is contained in the
files.
The returned EventGenerator should be deleted when it
is no longer needed.
makeEventGenerator may be called more than once.
void setOption(ParserEventGeneratorKit::Option opt)
makeEventGenerator() is called.
ParserEventGeneratorKit::Option is a local enum with the following possible
values:
ParserEventGeneratorKit::showOpenEntities
-e option of nsgmls.
ParserEventGeneratorKit::showOpenElements
-g option of nsgmls.
ParserEventGeneratorKit::outputCommentDecls
CommentDeclEvents to be generated.
ParserEventGeneratorKit::outputMarkedSections
MarkedSectionStartEvents,
MarkedSectionStartEvents
and IgnoredCharsEvents
to be generated.
ParserEventGeneratorKit::outputGeneralEntities
GeneralEntityEvents to be generated.
void setOption(ParserEventGeneratorKit::OptionWithArg opt, const char *arg)
makeEventGenerator() is called.
ParserEventGeneratorKit::OptionWithArg is a local enum with the following possible
values:
ParserEventGeneratorKit::addCatalog
-m option of nsgmls.
ParserEventGeneratorKit::includeParam
-i option of nsgmls.
ParserEventGeneratorKit::enableWarning
-w option of nsgmls.
ParserEventGeneratorKit::addSearchDir
-D option of nsgmls.
Creating an application with this interface involves the following steps:
SGMLApplication,
called, say, MyApplication.
FooEvent that the application
needs information from, define a member of MyApplication
void MyApplication::foo(const FooEvent &).
ParserEventGeneratorKit.
ParserEventGeneratorKit using
ParserEventGeneratorKit::setOption.
EventGenerator using
ParserEventGeneratorKit::makeEventGenerator.
MyApplication
(usually on the stack).
EventGenerator::run passing it a reference to
the instance of MyApplication.
EventGenerator.
The application must include the ParserEventGeneratorKit.h
file (which in turn includes EventGenerator.h and
SGMLApplication.h), which is in the generic
directory. If your compiler does not support the standard C++
bool type, you must ensure that bool is
defined as a typedef for int, before including this. One
way to do this is to include config.h and then
Boolean.h from the lib subdirectory of the SP
distribution.
On Unix, the application must be linked with the
lib/libsp.a library.
Here's a simple example of an application that uses this interface. The application prints an outline of the element structure of a document, using indentation to represent nesting.
// The next two lines are only to ensure bool gets defined appropriately.
#include "config.h"
#include "Boolean.h"
#include "ParserEventGeneratorKit.h"
#include <iostream.h>
ostream &operator<<(ostream &os, SGMLApplication::CharString s)
{
for (size_t i = 0; i < s.len; i++)
os << char(s.ptr[i]);
return os;
}
class OutlineApplication : public SGMLApplication {
public:
OutlineApplication() : depth_(0) { }
void startElement(const StartElementEvent &event) {
for (unsigned i = 0; i < depth_; i++)
cout << " ";
cout << event.gi << '\n';
depth_++;
}
void endElement(const EndElementEvent &) { depth_--; }
private:
unsigned depth_;
};
int main(int argc, char **argv)
{
ParserEventGeneratorKit parserKit;
// Use all the arguments after argv[0] as filenames.
EventGenerator *egp = parserKit.makeEventGenerator(argc - 1, argv + 1);
OutlineApplication app;
unsigned nErrors = egp->run(app);
delete egp;
return nErrors > 0;
}
There's a bigger example in the sgmlnorm directory in the SP
distribution.
This uses the SGMLApplication interface, but it doesn't
use the ParserEventGeneratorKit interface.
James Clark