psgml-api - Online Manual Page Of Unix/Linux

  Command: man perldoc info search(apropos)

WebSearch:
Our Recommended Sites: Full-Featured Editor
 

File: psgml-api.info,  Node: Top,  Next: Types,  Prev: (dir),  Up: (dir)

PSGML Internals
***************

* Menu:

* Types::                       Types and operations
* Hooks::                       Hooks
* Implementation::              Implementation notes
* Index::                       Index

 --- The Detailed Node Listing ---

Types and operations

* element::                     The element structure
* attribute::                   Attribute Types
* parser state::                Parser state
* DTD::                         DTD
* entities::                    Entities

File: psgml-api.info,  Node: Types,  Next: Hooks,  Prev: Top,  Up: Top

Types and operations
********************

   NOTE: Names of element types, attributes and entities should be
treated as far as possible as a real type.  In versions prior to 1.0
names are represented by lisp symbols but in 1.0 they are strings.

   Perhaps I should make a `psgml-api.el' that defines some functions
to deal with names.  Then it would be possible to write code that works
in both 0.4 and 1.0.

* Menu:

* element::                     The element structure
* attribute::                   Attribute Types
* parser state::                Parser state
* DTD::                         DTD
* entities::                    Entities

File: psgml-api.info,  Node: element,  Next: attribute,  Prev: Types,  Up: Types

The element structure
=====================

 - Data type: element
     The basic data type representing the element structure is the
     Element (this happens to be a node in the parse tree).

Mapping buffer positions to elements
------------------------------------

 - Function: sgml-find-context-of pos
     Return the element current at buffer position POS.  If POS is in
     markup, `sgml-markup-type' will be a symbol identifying the markup
     type.  It will be `nil' if POS is outside markup.

 - Function: sgml-find-element-of pos
     Return the element containing the character at buffer position POS.

Functions operating on elements
-------------------------------

 - Function: sgml-element-name element
     Returns the name of the element.  (obsolete)

 - Function: sgml-element-gi element
     Return the general identifier (string) of ELEMENT.

 - Function: sgml-element-level element
     Returns the level of ELEMENT in the element structure.  The
     document element is level 1.

Structure
.........

 - Function: sgml-top-element
     Return the document element.

 - Function: sgml-off-top-p element
     True if ELEMENT is the pseudo element above the document element.

   These functions return other related elements, or possibly `nil'.

 - Function: sgml-element-content element
     First element in content of ELEMENT, or nil.

 - Function: sgml-element-next element
     Next sibling of ELEMENT.  To loop thru all sub elements of an
     element, `el', You could do like this:

          (let ((c (sgml-element-content el)))
             (while c
                  <<Do something with c>>
                  (setq c (sgml-element-next c))))

 - Function: sgml-element-parent element
     Parent of ELEMENT.

Tags
....

 - Function: sgml-element-stag-optional element
     Return true if the start-tag of ELEMENT is omissible.

 - Function: sgml-element-etag-optional element
     Return true if the end-tag of ELEMENT is omissible.

 - Function: sgml-element-stag-len element
     Return the length of the start-tag of ELEMENT.  If the start-tag
     has been omitted the length is 0.

 - Function: sgml-element-etag-len element
     Return the length of the end-tag of ELEMENT.  If the end-tag has
     been omitted the length is 0.

 - Function: sgml-element-net-enabled element
     Return true, if ELEMENT or some parent of the element has null end
     tag (NET) enabled.  Return `t', if it is ELEMENT that has NET
     enabled.

Positions
.........

   These functions relates an element to positions in the buffer.

 - Function: sgml-element-start element
     Position of start of ELEMENT.

 - Function: sgml-element-end element
     Position after ELEMENT.

 - Function: sgml-element-stag-end element
     Position after start-tag of ELEMENT.

 - Function: sgml-element-etag-start element
     Position before end-tag of ELEMENT.

Attributes
..........

 - Function: sgml-element-attlist element
     Return the attribute declaration list for ELEMENT.

 - Function: sgml-element-attribute-specification-list element
     Return the attribute specification list for ELEMENT.

 - Function: sgml-element-attval element attribute
     Return the value of the ATTRIBUTE in ELEMENT, string or nil.

Misc technical
..............

 - Function: sgml-element-data-p element
     True if ELEMENT can contain data characters.

 - Function: sgml-element-mixed element
     True if ELEMENT has mixed content.

 - Function: sgml-element-eltype element

 - Function: sgml-element-empty element
     True if ELEMENT is empty.

 - Function: sgml-element-excludes element

 - Function: sgml-element-includes element

 - Function: sgml-element-model element
     Declared content or content model of ELEMENT.

 - Function: sgml-element-context-string element
     Return string describing context of ELEMENT.

File: psgml-api.info,  Node: attribute,  Next: parser state,  Prev: element,  Up: Types

Attribute Types
===============

   Basic types for attributes are names and values.  (See note about
names in *Note Types::.) And attribute values (attval) by lisp strings.

Attribute Declaration List Type
-------------------------------

 - Data type: attlist attdecl*
     This is the result of the ATTLIST declarations in the DTD.  All
     attribute declarations for an element is the elements attlist.

 - Function: sgml-lookup-attdecl name attlist
     Return attribute declaration (attdecl) for attribute NAME in
     attribute declaration list ATTLIST.

 - Function: sgml-attribute-with-declared-value attlist declared-value
     Find the first attribute in ATTLIST that has DECLARED-VALUE.

Attribute Declaration Type
--------------------------

 - Data type: attdecl name declared-value default-value
     This is the representation of an individual attribute declaration
     contained in an ATTLIST declaration.

 - Function: sgml-make-attdecl name declared-value default-value
     Produces an attdecl.

 - Function: sgml-attdecl-name attdecl
     Returns the name of an attribute declaration.

 - Function: sgml-attdecl-declared-value attdecl
     Returns the declared-value of attribute declaration ATTDECL.

 - Function: sgml-attdecl-default-value: attdecl
     Returns the default-value of attribute declaration ATTDECL.

Declared Value Type
-------------------

 - Data type: declared-value (token-group | notation | simple)
     A declared value of an SGML attribute can be of different kinds.
     If the declared value is a token group there is an associated list
     of name tokens.  For notation there is also a list of associated
     names, the allowed notation names.  The other declared values are
     represented by the type name as a lisp symbol.

 - Function: sgml-declared-value-token-group declared-value
     Return the name token group for the DECLARED-VALUE.  This applies
     to name token groups.  For other declared values nil is returned.

 - Function: sgml-declared-value-notation declared-value
     Return the list of notation names for the DECLARED-VALUE.  This
     applies to notation declared value.  For other declared values nil
     is returned.

Default Value Type
------------------

 - Data type: default-value (required | implied | conref | specified )
     There are several kinds of default values.  The REQUIRED, IMPLIED,
     and CONREF has no associated information.  The SPECIFIED have an
     associated attribute value and can be either `fixed' or `normal'.

 - Function: sgml-make-default-value type &optional attval

 - Function: sgml-default-value-attval default-value
     Return the actual default value of the declared DEFAULT-VALUE.
     The actual value is a string. Return `nil' if no actual value.

 - Function: sgml-default-value-type-p type default-value
     Return true if DEFAULT-VALUE is of TYPE.  Where TYPE is a symbol,
     one of `required', `implied', `conref', or `fixed'.

Attribute Specification Type
----------------------------

 - Data type: attspec name attval
     This is the result of parsing an attribute specification.

 - Function: sgml-make-attspec name attval
     Create an attspec from NAME and ATTVAL.  Special case, if ATTVAL
     is `nil' this is an implied attribute.

 - Function: sgml-attspec-name attspec
     Return the name of the attribute specified by ATTSPEC.

 - Function: sgml-attspec-attval attspec
     Return the value (attval) of attribute specification ATTSPEC.  If
     ATTSPEC is `nil', `nil' is returned.

Attribute Specification List Type
---------------------------------

 - Data type: asl attspec*
     This is the result of parsing an attribute specification list.

 - Function: sgml-lookup-attspec name asl
     Return the attribute specification for attribute with NAME in the
     attribute specification list ASL.  If the attribute is unspecified
     `nil' is returned.

File: psgml-api.info,  Node: parser state,  Next: DTD,  Prev: attribute,  Up: Types

Parser state
============

   The state of the parser that needs to be kept between commands are
stored in a buffer local variable.  Some global variables are
initialised from this variable when parsing starts.

 - Variable: sgml-buffer-parse-state
     The state of the parser that is kept between commands.  The value
     of this variable is of type pstate.

 - Data type: pstate
     The parser state.

 - Function: sgml-pstate-dtd pstate
     The document type information (dtd) for the parser.

File: psgml-api.info,  Node: DTD,  Next: entities,  Prev: parser state,  Up: Types

DTD
===

 - Data type: dtd
     Represents what PSGML knows about the DTD.

 - Function: sgml-dtd-doctype dtd
     The document type name.

 - Function: sgml-dtd-eltypes dtd
     The table of element types.

 - Function: sgml-dtd-entities dtd
     The table of declared general entities (entity-table).

 - Function: sgml-dtd-parameters dtd
     The table of declared parameter entities (entity-table).

 - Function: sgml-dtd-shortmaps dtd
     The list of short reference maps.

 - Function: sgml-dtd-notations dtd
     Not yet implemented.

File: psgml-api.info,  Node: entities,  Prev: DTD,  Up: Types

Entities
========

 - Data type: entity
     An entity has the following properties:

    NAME
          The name of the entity (a string).  This is either the name
          of a declared entity (general or parameter) or the doctype
          name if it is the implicit entity referred to by the doctype
          declaration.

    TYPE
          This is a symbol.  It is `text' if it is a text entity, other
          values are `cdata', `ndata', `sdata', `sgml' or `dtd'.

    TEXT
          This is the text of the entity, either a string or an external
          identifier.

   Operations on entities

 - Function: sgml-make-entity name type text
     Create an entity.

 - Function: sgml-entity-name entity
     The name of the entity.

 - Function: sgml-entity-type entity
     The type of the entity.

 - Function: sgml-entity-text entity
     The text of the entity.

 - Function: sgml-entity-insert-text entity
     Insert the text of the entity into the current buffer at point.

 - Function: sgml-entity-data-p entity
     True if ENTITY is a data entity, that is not of type `text'.

 - Data type: entity-table
     A table of entities that can be referenced by entity name.

 - Function: sgml-lookup-entity name entity-table
     The entity with named NAME in the table ENTITY-TABLE.  If no such
     entity exists, `nil' is returned.

 - Function: sgml-entity-declare name entity-table type text
     Create an entity from NAME, TYPE and TEXT; and enter the entity
     into the table ENTITY-TABLE.

 - Function: sgml-entity-completion-table entity-table
     Make a completion table from the ENTITY-TABLE.

 - Function: sgml-map-entities fn entity-table &optional collect
     Apply the function FN to all entities in ENTITY-TABLE.  If COLLECT
     is `t', the results of the applications are collected in a list
     and returned.

File: psgml-api.info,  Node: Hooks,  Next: Implementation,  Prev: Types,  Up: Top

Hooks
*****

 - Variable: sgml-open-element-hook
     The hook run by `sgml-open-element'.  Theses functions are called
     with two arguments, the first argument is the opened element and
     the second argument is the attribute specification list.  It is
     probably best not to refer to the content or the end-tag of the
     element.

 - Variable: sgml-close-element-hook
     The hook run by `sgml-close-element'.  These functions are invoked
     with `sgml-current-tree' bound to the element just parsed.

 - Variable: sgml-doctype-parsed-hook
     This hook is called after the doctype has been parsed.  It can be
     used to load any additional information into the DTD structure.

 - Variable: sgml-sysid-resolve-functions
     This variable should contain a list of functions.  Each function
     should take one argument, the system identifier of an entity.  If
     the function can handle that identifier, it should insert the text
     of the entity into the current buffer at point and return t.  If
     the system identifier is not handled the function should return
     nil.

 - Variable: sgml-doctype-parsed-hook
     This hook is caled after the doctype has been parsed.  It can be
     used to load any additional information into the DTD structure.

 - Variable: sgml-close-element-hook
     The hook run by `sgml-close-element'.  These functions are invoked
     with `sgml-current-tree' bound to the element just parsed.

   *** sgml-new-attribute-list-function This hook is run when a new
element is inserted to construct the attribute specification list. The
default function prompts for the required attributes.

File: psgml-api.info,  Node: Implementation,  Next: Index,  Prev: Hooks,  Up: Top

Implementation notes
********************

Data Types and Operations
=========================

Element Type
------------

 - Data type: eltype
     Data type representing the information about an element type.  An
     `eltype' has information from `ELEMENT' and `ATTLIST'
     declarations.  It can also store data for the application.

   The element types are symbols in a special oblist.  The oblist is the
table of element types.  The symbols name is the GI, its value is used
to store three flags and the function definition holds the content
model.  Other information about the element type is stored on the
property list.

 - Function: sgml-eltype-name et
     The name (a string) of the element type ET.

 - Function: sgml-eltype-appdata et prop
     Get application data from element type ET with name PROP.  PROP
     should be a symbol, reserved names are: flags, model, attlist,
     includes, excludes, conref-regexp, mixed, stag-optional,
     etag-optional.

     This function can be used as a place in `setf', `push' and other
     functions from the CL library.

 - Function: sgml-eltype-all-miscdata eltype
     A list of all data properties for eltype except for flags, model,
     includes and excludes.  This function filters the property list of
     ELTYPE.  Used when saving the parsed DTD.

 - Function: sgml-eltype-set-all-miscdata eltype miscdata
     Append the MISCDATA data properties to the properties of ELTYPE.

 - Function: sgml-eltype-attlist et
     The attribute specification list for the element type ET.

 - Function: sgml-eltype-completion-table eltypes
     Make a completion table from a list, ELTYPES, of element types.

 - Function: sgml-eltype-stag-optional et
     True if the element type ET has optional start-tag.

 - Function: sgml-eltype-etag-optional et
     True if the element type ET has optional end-tag.

 - Function: sgml-eltype-excludes et
     The list of excluded element types for element type ET.

 - Function: sgml-eltype-includes et
     The list of included element types for element type ET.

 - Function: sgml-eltype-flags et
     Contains three flags as a number.  The flags are stag-optional,
     etag-optional and mixed.

 - Function: sgml-eltype-mixed et
     True if element type ET has mixed content.

 - Function: sgml-eltype-model et
     The content model of element type ET.  The content model is either
     the start state in the DFA for the content model or a symbol
     identifying a declared content.

 - Function: sgml-eltype-shortmap et
     The name of the shortmap associated with element type ET.  This
     can also be the symbol `empty' (if declared with a `<!USEMAP gi
     #EMPTY>' or `nil' (if no associated map).

 - Function: sgml-eltype-token et
     Return a token for the element type ET.

 - Function: sgml-eltypes-in-state state tree
     List of element types valid in STATE and TREE.

DTD
---

   The DTD data type is realised as a lisp vector using `defstruct'.

   There are two additional fields for internal use: dependencies and
merged.

 - Function: sgml-dtd-dependencies dtd
     The list of files used to create this DTD.

 - Function: sgml-dtd-merged dtd
     The pair (FILE . MERGED-DTD), if the DTD has had a precompiled dtd
     merged into it.  FILE is the file containing the compiled DTD and
     MERGED-DTD is the DTD loaded from that file.

Element and Tree
----------------

 - Data Type: tree
     This is the data type for the nodes in the tree build by the
     parser.

   The tree nodes are represented as lisp vectors, using `defstruct' to
define basic operations.

   The Element data type is a view of the tree built by the parser.

Parsing model
=============

   PSGML uses finite state machines and a stack to parse SGML.  Every
element type has an associated DFA (deterministic finite automaton).
This DFA is constructed from the content model.

   SGML restricts the allowed content models in such a way that it is
easy to directly construct a DFA.

   To be able to determine when a start-tag can be omitted the DFA need
to contain some more information than the traditional DFA.  In PSGML a
DFA has a set of states and two sets of edges.  The edges are associated
with tokens (corresponding to SGML's primitive content tokens).  I call
these moves.  One set of moves, the "optional moves", represents
optional tokens.  I call the other set "required moves".  The
correspondence to SGML definitions are: if there is precisely one
required move from one state, then the associated token is required.  A
state is final if there is not required move from that state.

   The SGML construct `(...&...&...)' ("AND-group") is another problem.
There is a simple translation to sequence- and or-connectors.  For
example `(a & b & c)' is can be translated to:

     ((a, ((c, b) | (b, c))) |
      (b, ((a, c) | (c, a))) |
      (c, ((a, b) | (b, a))) )

   But this grows too fast to be of direct practical use.  PSGML
represents an AND-group with one DFA for every (SGML) token in the
group.  During parsing of an AND-group there is a pointer to a state in
one of the group's DFAs, and a list of the DFAs for the tokens not yet
satisfied.  Most of this is hidden by the primitives for the state
type.  The parser only sees states in a DFA and moves.

Entity manager
==============

 - Function: sgml-push-to-entity entity &optional ref-start type
     Set current buffer to a buffer containing the entity ENTITY.
     ENTITY can also be a file name. Optional argument REF-START should
     be the start point of the entity reference. Optional argument
     TYPE, overrides the entity type in entity look up.

 - Function: sgml-pop-entity
     Should be called after a `sgml-push-to-entity' (or similar).
     Restore the current buffer to the buffer that was current when the
     push to this buffer was made.

 - Function: sgml-push-to-string string
     Create an entity from STRING and push it on the top of the entity
     stack. After this the current buffer will be a scratch buffer
     containing the text of the new entity with point at the first
     character.

     Use `sgml-pop-entity' to exit from this buffer.

Parser functions
================

 - Function: sgml-need-dtd
     This makes sure that the buffer has a DTD and set global variables
     needed by parsing routines. One global variable is `sgml-dtd-info'
     which contain the DTD (type dtd).

 - Function: sgml-parse-to goal &optional extra-cond quiet
     This is the low level interface to the parser.

     Parse until (at least) GOAL, a buffer position. Optional argument
     EXTRA-COND should be a function. This function is called in the
     parser loop, and the loop is exited if the function returns t. If
     third argument QUIT is non-`nil', no "`Parsing...'" message will
     be displayed.

 - Function: sgml-reparse-buffer shortref-fun
     Reparse the buffer and let SHORTREF-FUN take care of short
     references.  SHORTREF-FUN is called with the entity as argument
     and `sgml-markup-start' pointing to start of short reference and
     point pointing to the end.

Saved DTD Format
================

File =         Comment,
               File version,
               S-expression --dependencies--,
               Parameter entites,
               Document type name,
               Elements,
               General entities,
               S-expression --shortref maps--,
               S-expression --notations--

Elements =     Counted Sequence of S-expression --element type name--,
               Counted Sequence of Element type description

File version = "(sgml-saved-dtd-version 5)
"

Comment =      (";",
                (CASE
                  OF [0-9]
                  OF [11-255])*,
                [10] --end of line marker--)*

Element type description = S-expression --Misc info--,
               CASE
                OF [0-7] --Flags 1:stag-opt, 2:etag-opt, 4:mixed--,
                    Content specification,
                    Token list --includes--,
                    Token list --excludes--
                OF [128] --Flag undefined element--

Content specification = CASE
                OF [0] --cdata--
                OF [1] --rcdata--
                OF [2] --empty--
                OF [3] --any--
                OF [4] --undefined--
                OF [128] --model follows--,
                    Model --nodes in the finite state automaton--

Model =        Counted Sequence of Node

Node =         CASE
                OF Normal State
                OF And Node

Normal State = Moves --moves for optional tokens--,
               Moves --moves for required tokens--

Moves =        Counted Sequence of (Token,
                    OCTET --state #--)

And Node =     [255] --signals an AND node--,
               Number --next state (node number)--,
               Counted Sequence of Model --set of models--

Token =        Number --index in list of elements--

Number =       CASE
                OF [0-250] --Small number 0--250--
                OF [251-255] --Big number, first octet--,
                    OCTET --Big number, second octet--

Token list =   Counted Sequence of Token

Parameter entites = S-expression --internal representation of parameter entities--

General entities = S-expression --internal representation of general entities--

Document type name = S-expression --name of document type as a string--

S-expression = OTHER

Counted Sequence = Number_a --length of sequence--,
               (ARG_1)^a

File: psgml-api.info,  Node: Index,  Prev: Implementation,  Up: Top

Index
*****

   Types

* Menu:

* asl:                                   attribute.
* attdecl:                               attribute.
* attlist:                               attribute.
* attspec:                               attribute.
* declared-value:                        attribute.
* default-value:                         attribute.
* dtd:                                   DTD.
* element:                               element.
* eltype:                                Implementation.
* entity:                                entities.
* entity-table:                          entities.
* pstate:                                parser state.
* tree:                                  Implementation.