%dtdmods; %p2idmss; %ISOLat1; %ISOdia; ]>
Driver file for TEI P2, Segmentation and Analysis Workpapers of TR3 and AI1
Dummy

The IDs for chapters other than SA are included here: 1 About These Guidelines (TEI P1 1) 1.1 Texts and Their Electronic Representation 1.2 Intended Applications 1.3 Origin and Development 1.4 Design Principles 1.5 Structure of This Document 1.6 Status of This Draft 1.7 Future Development of the Guidelines ]]> 2 Concise Summary of SGML ]]> 3 Structure of the TEI Document Type Declarations (P1 1) 3.1 Main and Auxiliary DTDs (id=STma) 3.2 Base Tag Sets and Additional Tag Sets (id=STba) 3.3 Global Attributes (id=STga) 3.4 Element Classes and Other Parameter Entities (id=STec) 3.5 Invocation of TEI DTDs (id=STin) 3.6 Combining TEI DTD Fragments (id=STco) ]]> 4 Characters and Character Sets (P1 3) 4.1 Local Character Sets 4.2 Shifting among Character Sets 4.3 Character Set Problems and Interchange 4.4 Writing System Declaration ]]> 5 The TEI Header (P1 4) 5.1 Organization of the TEI Header 5.1.1 The TeiHeader and Its Components 5.1.2 Types of Content in the TEI Header 5.2 The File Description 5.2.1 The Title Statement 5.2.2 The Edition Statement 5.2.3 Type and Extent of File 5.2.4 Publication, Distribution, etc. 5.2.5 The Series Statement 5.2.6 The Notes Statement 5.2.7 The Source Description 5.2.8 Computer Files Derived from Other Computer Files 5.2.9 Computer Files Composed of Transcribed Speech 5.3 The Encoding Description 5.3.1 The Project Description 5.3.2 The Sampling Declaration 5.3.3 The Editorial Practices Declaration 5.3.4 The Reference System Declaration 5.3.4.1 Prose method 5.3.4.2 Stepwise method 5.3.4.3 Milestone method 5.3.5 The Classification Declaration 5.4 The Profile Description 5.4.1 Creation 5.4.2 Language Usage 5.4.3 The Text Classification 5.5 The Revision Description 5.6 Minimal and Recommended Headers 5.7 Note for Library Cataloguers ]]> 6 Elements Available in All TEI DTDs 6.1 Paragraphs (P1 5.3.1) 6.2 Ambiguous Punctuation 6.3 Highlighting and Quotation 6.3.1 What Is Highlighting? 6.3.2 Emphasis, Foreign Words, and Unusual Language 6.3.2.1 Foreign Words or Expressions 6.3.2.2 Emphatic Words and Phrases 6.3.2.3 Other Linguistically Distinct Material 6.3.2 Quotation 6.3.3 Terms, Glosses, and Cited Words 6.3.4 Some Further Examples 6.4 Names, Numbers, Dates, Abbreviations, and Addresses 6.4.1 Names 6.4.2 Numbers and Measures 6.4.3 Dates and Times 6.4.4 Abbreviations and Their Expansions 6.4.5 Addresses 6.5 Simple Editorial Changes 6.5.1 Correction of Apparent Errors 6.5.2 Regularization and Normalization 6.5.3 Additions, Deletions and Omissions 6.6 Simple Links and Cross References (TR3) 6.7 Lists (P1 5.3.8) 6.8 Notes, Annotation, and Indexing (P1 5.3.9) 6.8.1 Notes and Simple Annotations 6.8.2 Index Entries 6.9 Reference Systems (P1 5.6) 6.9.1 Using the ID and N Attributes 6.9.2 Creating New Reference Systems 6.9.3 Concurrent Markup for Pages and Lines 6.9.4 Concurrent Markup for Other Hierarchies 6.9.5 Milestone Tags 6.9.6 Declaring Reference Systems 6.10 Bibliographic Citations (P1 5.5) 6.10.1 Bibliographic Citation Elements 6.10.2 Components of Bibliographic Citations 6.10.3 Citation References 6.10.4 Relationship to Other Bibliographic Schemes 6.11 Passages of Verse or Drama 6.11.1 Verse 6.11.2 Drama 6.12 Segmentation ]]> 7 Base Tag Set for Prose 7.1 Divisions of the Body 7.1.1 Un-numbered Divisions 7.1.2 Numbered Divisions 7.1.3 Numbered or Un-numbered? 7.2 Contents of Prose Divisions 7.3 Front Matter 7.4 Title Pages 7.5 Back Matter 7.6 Specifying the Prose Base 7.7 Overall Structure of the Prose DTD ]]> 8 Base Tag Set for Verse (TR10) ]]> 9 Base Tag Set for Drama (TR 11) ]]> 10 Base Tag Set for Transcriptions of Spoken Texts (AI2) 10.4.1 Segments ]]> 11 Base Tag Set for Letters and Memos (?) ]]> 12 Base Tag Set for Printed Dictionaries (AI5) ]]> 13 Base Tag Set for Terminological Data (AI7) ]]> 14 Base Tag Set for Language Corpora and Collections (TR6) ]]> 15 User-defined Base Tag Sets (AI4) ]]> 16. Segmentation and Alignment 16.1 Pointers and links 16.2 External pointers and references 16.2.1 TEI extended pointer syntax specification 16.2.2 Using Extended pointers 16.3 Correspondence and Alignment 16.3.1 A detailed example 16.3.2 Alignment using external pointers 16.3.3 Further example 16.4 Aggregation and Virtual Elements 16.4.1 Extended example ]]> 17 Simple Analytic Mechanisms 17.4 Virtual Copies ]]> 18 Feature Structure Analysis ]]> 19 Certainty ]]> 20 Manuscripts, Analytic Bibliography, and Physical Description ]]> 21 Critical Editions (TR2) ]]> 22 Additional Tags for Names and Dates ]]> 23 Graphs, Digraphs, and Trees ]]> 24 Graphics, Figures, and Illustrations ]]> 25 Formulae and Tables (TR4) ]]> 26 Additional Tags for TEI Header ]]> 27 Structured Header ]]> 28 Writing System Declaration ]]> 29 Feature System Declaration ]]> 30 Tag Set Documentation ]]> 31 TEI Conformance ]]> 32 Modifying TEI DTDs ]]> 33 Local Installation and Support of TEI Markup ]]> 34 Use of TEI Encoding Scheme in Interchange ]]> 35 Relationship of TEI to Other Standards ]]> 36 Markup for Non-Hierarchical Phenomena ]]> 37 Algorithm for Recognizing Canonical References ]]> 38 Full TEI Document Type Declarations ]]> 39 Standard Writing System Declarations ]]> 40 Feature System Declaration for Basic Grammatical Annotation ]]> 41 Sample Tag Set Declaration ]]> 42 Formal Grammar for the TEI-Interchange Format Subset of SGML 42.1 Notation 42.2 Grammar for SGML Document (Overview) 42.3 Grammar for SGML Declaration 42.4 Grammar for DTD 42.5 Grammar for Document Instance 42.6 Common Syntactic Constructs 42.7 Lexical Scanner 42.8 Differences from ISO 8879 ]]> Dummy Div2 Dummy Div3 Dummy Div4 Segmentation and Alignment

This chapter and the following two propose a number of ways in which encoders may represent analyses of the structure of a text which are not simply linear or hierarchic. In particular, mechanisms are provided for the following common requirements: the ability to encode the fact that distinct textual segments correspond, for example because one is a translation of the other the ability to show that distinct textual segments should be aggregated, for example because they form a discontinuous segment of some kind the ability to show that one segment is an echo or copy of another the ability to associate segments of texts with an abstract interpretation or analysis of their significance

These mechanisms are all implemented using the same basic set of techniques, all of which depend on the ability to refer to a segment by some form of identifier. The most convenient such identifier, and that which is recommended by these Guidelines wherever possible, is provided by the global id attribute, as defined in section . In addition, for segments which are located in different SGML documents, or to which identifiers cannot be attached (perhaps because they are held on read-only media), an additional TEI extended pointer mechanism is defined, (see section ).

This chapter therefore begins with a discussion of the ways in which textual elements of any kind can be referred to within TEI documents (section ), together with the definition of the TEI extended pointer mechanism. The facilities described in this section provide for very general linkages, of the type commonly described as hypertextual; this is followed by a discussion of two specific kinds of linkage, representing correspondence (section ), and aggregation (section ).

Other chapters also describe some topics relevant to linking and alignment. Chapter discusses ways of representing simple analyses of text and of linking them with textual segments; it also discusses the third kind of link mentioned above, where one segment is regarded as a copy or echo of another (in section ). Chapter discusses in detail the most fully articulated way of representing arbitrary analyses proposed in these Guidelines, as feature structures; such analyses may be linked with a text with the same mechanisms described in this chapter.

In each case a choice of mechanisms is offered between simple but less general methods depending on the use of attribute values only, and more general but more complex methods depending on the use of specialized elements.

The following DTD fragment shows the overall organization of the class of analytic elements discussed in the remainder of this chapter, and declares the set of global analytic attributes available when this tag set is used. ]]>

This tag set is made available by the mechanisms described in section ; in a document which uses the markup described in this chapter, the document type declaration should contain the following declaration of the entity TEI.sa, or an equivalent one: ]]> The entire document type declaration for a document using this additional tag set might look like this: ]> ]]> Pointers and Links

A pointer is a special element, the function of which is to represent an association of some kind between one location in a document (at which the element is placed) and one or more others, known collectively as its target. One familiar kind of pointer, discussed in section , is a cross-reference. Another, typical of many hypertext systems, is a button or other device used to allow the user to control the non-sequential processing which characterizes such systems. A link (in the terminology of these Guidelines) is a special element, similar to a pointer, which represents an association between two (or more) locations by specifying each location explicitly. Its own location is irrelevant to the intended linkage.

In the simplest case, pointers and links are encoded using the elements ptr, ref, link, and linkGrp. (For the more general case, see section .) defines a pointer to another location in the current document. Attributes include: specifies the destination of the pointer as one or more SGML identifiers defines a reference to another location in the current document, in terms of one or more identifiable elements, possibly modified by additional text or comment. Attributes include: specifies the destination of the reference as one or more SGML identifiers defines an association or hypertextual link among elements or passages, of some type not more precisely specifiable by other elements. Attributes include: specifies the SGML identifiers of the elements or passages to be linked or associated. defines a collection of associations or hypertextual links. Attributes include: categorizes the group of associations in some respect, using any convenient set of categories.

The ptr and ref elements bear a target attribute (in the singular), because they point, conceptually, at a single target, even if that target may be discontinuous in the document. The link element bears a targets attribute, with a plural name, because it points, conceptually, to at least two targets, each of which is a unitary object.

The ptr, ref, and link elements are all members of the class pointer, and share a common set of attributes: categorizes the pointer in some respect, using any convenient set of categories. specifies the creator of the pointer. specifies when the pointer was created. specifies the kinds of elements to which this pointer may point. where more than one identifier is supplied as the value of the target attribute, this attribute specifies whether the order in which they are supplied is significant. Sample values include: Yes: the order in which IDREFs are specified as the value of a target attribute should be followed when combining the targeted elements. No: the order in which IDREFs are specified as the value of a target attribute has no significance when combining the targeted elements. Unspecified: the order in which IDREFs are specified as the value of a target attribute may or may not be significant. specifies the intended meaning when the target of a pointer is itself a pointer. Sample values include: if the element pointed to is itself a pointer, then the target of that pointer will be taken, and so on, until an element is found which is not a pointer. if the element pointed to is itself a pointer, then its target (whether a pointer or not) is taken as the target of this pointer. no further evaluation of targets is carried out beyond that needed to find the element specified in the pointer's target.

The elements ptr and ref, which are used to implement pointers within a single TEI document, are discussed in section . As noted there, these elements specify their target or targets by supplying a list of identifiers as the value of their target attribute. The position of ref and ptr elements within a document is, in general, determined by their meaning: the elements occur where the cross-reference or hypertext link occurs in the text.

The element link may be used to implement general purpose links among elements in a TEI document; it is intended for use where none of the more specific kinds of linkage discussed in the remainder of this chapter and the two subsequent ones is appropriate. The targets of a link are represented by its targets attribute, which takes a list of SGML identifiers as its value, much like the target attribute of ref and ptr, but with slightly different meaning, as described below. Unlike that of ptr and ref, the location of link elements within the SGML document is arbitrary.

The element linkGrp may be used to group links together in a single part of the document; such a collection represents what is often referred to in the hypertextual literature as a web, a term introduced by the Brown University FRESS project in 1969. Typical software might hide a web entirely and merely use it as a source of information about links, which are displayed independently at their referenced locations. Alternatively, software might provide a direct view of the link collection, along with added functions for manipulating the collection, as by filtering, sorting, and so on. (Such processing is not specified by these guidelines.) The linkGrp element also provides a convenient way of establishing a default for the type attribute on a group of links of the same type: by default, the type attribute on a link element has the same value as that given for type on the enclosing linkGrp.

All elements pointed to or linked by these elements must be identifiable using the global id attribute. This implies that they must be present in the same document, and that they must bear unique id values. Pointing or linking to external documents and pointing or linking where SGML identifiers are not available is implemented by the external pointing mechanisms discussed in section . Using those mechanisms, pointers may be represented using the xptr and xref elements. External links and links to elements without identifiers do not require a special element; they may be represented using the standard link element, but an intermediate xptr element must be provided within the current document, to bear the id attribute used in the target of the link. The syntax and semantics of the link and linkGrp elements are identical, whether their targets are within the current document or not.

As an example of the use of these elements, consider the practice (common in 18th century English verse and elsewhere) of providing footnotes citing parallel passages from classical authors. In the usual case, such footnotes might of course simply be encoded using the general purpose note element, embedded within the text, as follows: The type attribute on the note is used to classify the notes using the typology established in the Advertisement to the work: The Imitations of the Ancients are added, to gratify those who either never read, or may have forgotten them; together with some of the Parodies, and Allusions to the most excellent of the Moderns. In the source text, the text of the poem shares the page with two sets of notes, one headed Remarks and the other Imitations. (Diff'rent our parties, but with equal grace The Goddess smiles on Whig and Tory race, Virg. Æn. 10. Tros Rutulusve fuat; nullo discrimine habebo. —— Rex Jupiter omnibus idem. 'Tis the same rope at sev'ral ends they twist, To Dulness, Ridpath is as dear as Mist) ]]>

For this simple type of text-note link, it is unlikely that any mechanism more complex than that shown in the example above would be needed. For completeness however, and to facilitate comparison of the various techniques proposed by these Guidelines, we now present three alternative methods of linking the note with the text annotated: pointing from text to note, pointing from note to text, and linking note and text with a link.

All of these methods allow the annotation to be transcribed separately from the text itself. If for example the annotation occurs as an end-note (or in a separate commentary) rather than a footnote, it may be desired to transcribe it where it occurs rather than embedding it within the text. In some cases, especially when the annotations are numerous, it might be felt more convenient to encode them separately even if they do appear as footnotes. The techniques used in the following examples may thus be useful in some cases where the simple tagging shown above cannot be used.

First, a ptr element might be placed at an appropriate point within the text to link it with the annotation: (Diff'rent our parties, but with equal grace The Goddess smiles on Whig and Tory race, 'Tis the same rope at sev'ral ends they twist, To Dulness, Ridpath is as dear as Mist) ... Virg. Æn. 10. Tros Rutulusve fuat; nullo discrimine habebo. —— Rex Jupiter omnibus idem. ]]> The note element has been given an arbitrary identifier (N3.284) to enable it to be specified as the target of the pointer element. Because there is no marker in the text to signal the existence of the annotation, the rend attribute has been given the value unmarked.

Secondly, the target attribute of the note element can be used to point at its associated text: (Diff'rent our parties, but with equal grace The Goddess smiles on Whig and Tory race, 'Tis the same rope at sev'ral ends they twist, To Dulness, Ridpath is as dear as Mist) ... Virg. Æn. 10. Tros Rutulusve fuat; nullo discrimine habebo. —— Rex Jupiter omnibus idem. ]]> An SGML identifier has been given to each line, for convenience in transcribing other commentaries, even though in this example the note points only at line L3.284.

If the note, as frequently occurs, begins with an explicit reference to the line annotated, a ref element may be used to encode the reference. In order to facilitate hypertext jumps from text to annotation, one might also prefer to encode both pointers from note to text and a pointer from text to note: (Diff'rent our parties, but with equal grace The Goddess smiles on Whig and Tory race, 'Tis the same rope at sev'ral ends they twist, To Dulness, Ridpath is as dear as Mist) ... Verse 283-84. ——. With equal grace Our Goddess smiles on Whig and Tory race.] Virg. Æn. 10. Tros Rutulusve fuat; nullo discrimine habebo. —— Rex Jupiter omnibus idem. ]]>

In the third method, identifiers are supplied for both verse line and annotation, and a link element is used to associate the two: (Diff'rent our parties, but with equal grace The Goddess smiles on Whig and Tory race, 'Tis the same rope at sev'ral ends they twist, To Dulness, Ridpath is as dear as Mist) ... Verse 283-84. ——. With equal grace Our Goddess smiles on Whig and Tory race.] Virg. Æn. 10. Tros Rutulusve fuat; nullo discrimine habebo. —— Rex Jupiter omnibus idem. ... ]]> The targets attribute of the link element here bears two identifiers. In general, it is the responsibility of an application to determine how the two indicated elements are associated, although encoders may provide a hint, as here, by the use of the type attribute. The values for this attribute are not specified in these Guidelines, but should be documented in the TEI Header, in the encodingDesc element. These Guidelines do however provide for more specific encoding of some very commonly occurring kinds of association, for example, correspondence (see section ) or aggregation (see section ); when these more specialized tags apply, they should normally be preferred to the more general purpose link element.

These elements are formally defined as follows: ]]> Multi-headed Pointers

The sample encodings given in the previous section do not represent the fact that the annotation is associated with both lines of the couplet rather than an arbitrary point within the second line. This is partly because there is no single element in the text containing the two lines. One simple method of overcoming the problem would be to introduce such an element, either explicitly --- for example, a lg element, as defined in section --- or implicitly, using the join virtual element discussed in section . In either case the new element's id attribute could be used to provide the value for the target attribute of the note element or of the ref element, embedded within the note. Alternatively, if the number of elements to be combined in this way is not too large, it may be simpler to use a multi-headed pointer from the note into the text, supplying identifiers for each of the two lines as the target for the note and for the cross-reference. (Diff'rent our parties, but with equal grace The Goddess smiles on Whig and Tory race, ... Verse 283-84. ——. With equal grace Our Goddess smiles on Whig and Tory race.] Virg. Æn. 10. Tros Rutulusve fuat; nullo discrimine habebo. —— Rex Jupiter omnibus idem. ]]>

When the target attribute of a ptr or ref element specifies more than one element, the indicated elements are always understood to be combined or aggregated in some way to produce the object of the pointer. The attribute targOrder should be used, as here, to indicate whether or not the sequence in which constituents of the target list are specified should be followed when combining them. An additional check on the constituents of the list may be specified by the targType attribute; if the above example were: Verse 283-84. ... ]]> then an application could require that the elements indicated by identifiers L3.283 and L3.284 were instances of element types l or ptr only.

This use of multiheaded pointing should be carefully distinguished from that used by the targets attribute of the link element. As the attribute name suggests, in the latter case, the indicated elements are not simply combined together as if they were a single target, but associated in some respect, specifically, that indicated by the type attribute. Crucially, the intended relationship between every item in a list of identifiers must be identical. To represent the link discussed above as follows would be erroneous: ]]> This is not correct, because the association between N3.284 and L3.284 (the note and one line of text) is not of the same kind as that between L3.283 and L3.284 (the two lines of text). A processor cannot be expected to process this kind of target list correctly. To represent the link correctly, an intermediate pointer or a virtual element must be introduced: ]]> The link now associates the note with a pointer element, rather than with the lines of verse directly. This pointer element in turn points to the two lines. Indirect pointing of this kind is of particular importance when the extended pointer mechanism is used (see section ) but is essential wherever links are made among targets, some of which are themselves multi-headed. The attribute evaluate should be used to require that any pointer encountered as the target of a pointer is itself evaluated, where this is the intended action. (If evaluate has the value none, the link target will be the pointer itself, rather than the objects it points to; this makes it possible to make links between links, and to point at pointers.)

Finally, where a number of links share the same type attribute value, we can avoid the need to specify the type attribute on each link by defining a linkGrp element to contain all such links. A link group might be used to hold all the links of a particular type (e.g. imitation). For example, assuming the existing elsewhere in the document of the following other notes, of a kind similar to that already discussed, A place there is, betwixt earth, air and seas Where from Ambrosia, Jove retires for ease. ... Sign'd with that Ichor which from Gods distills. ... (Diff'rent our parties, but with equal grace The Goddess smiles on Whig and Tory race, 'Tis the same rope at sev'ral ends they twist, To Dulness, Ridpath is as dear as Mist) Ovid Met. 12. Orbe locus media est, inter terrasq; fretumq; Cœlestesq; plagas — Alludes to Homer, Iliad 5.... Virg. Æn. 10. Tros Rutulusve fuat; nullo discrimine habebo. —— Rex Jupiter omnibus idem. ... ]]> then it might be convenient to group all the links involving notes of type imitation at a single place in the encoded text, as follows: ]]> External Pointers and References

Where the object of a link or pointer element is not contained within the current document, or where it does not bear an id attribute, it is not possible to point at it with a ptr or ref element, nor to link it directly with a link element, because no IDREF value can be supplied for the target or targets attribute of these elements. In such cases, the encoder must indicate the intended element indirectly by means of the elements discussed in this section. defines a pointer to another location in the current document or an external document. Attributes include: specifies the document within which the required location is to be found. specifies the start of the destination of the pointer as an expression in the TEI extended pointer notation. specifies the endpoint of the destination of the pointer as an expression in the TEI extended pointer notation. defines a reference to another location in the current document, or an external document, using an extended pointer notation, possibly modified by additional text or comment. Attributes include: specifies the document within which the required location is to be found. specifies the start of the destination of the pointer as an expression in the TEI extended pointer notation. specifies the endpoint of the destination of the pointer as an expression in the TEI extended pointer notation. As members of the element class pointer, these elements share the following set of attributes: categorizes the pointer in some respect, using any convenient set of categories. specifies the creator of the pointer. specifies when the pointer was created. specifies the kinds of elements to which this pointer may point. where more than one identifier is supplied as the value of the target attribute, this attribute specifies whether the order in which they are supplied is significant. Sample values include: Yes: the order in which IDREFs are specified as the value of a target attribute should be followed when combining the targeted elements. No: the order in which IDREFs are specified as the value of a target attribute has no significance when combining the targeted elements. Unspecified: the order in which IDREFs are specified as the value of a target attribute may or may not be significant. specifies the intended meaning when the target of a pointer is itself a pointer. Sample values include: if the element pointed to is itself a pointer, then the target of that pointer will be taken, and so on, until an element is found which is not a pointer. if the element pointed to is itself a pointer, then its target (whether a pointer or not) is taken as the target of this pointer. no further evaluation of targets is carried out beyond that needed to find the element specified in the pointer's target.

Unlike the pointer elements discussed in the previous section, these elements do not specify their target by means of a target attribute. Instead they use one or both of the attributes from and to to delimit a portion of some document. In other respects, these elements correspond with the elements ptr and ref discussed in sections , and . Note that there is no element xlink corresponding with the link element; this makes it possible to make links both within and between documents using the same syntax, as further discussed below.

The values of the from and to attributes on the xptr and xref elements indicate the point or passage being referred to by showing how to locate it, using one or more special keywords, as defined below in section . Examples are given there.

The xptr and xref elements are formally defined as follows: ]]> TEI Extended Pointer Syntax

The elements xptr and xref are used to represent a link between their own location (the link origin) and some other location (the destination), which may or may not be in the same document. Software supporting intra- and inter-document links (e.g. hypertext systems) should provide access from the location of such an element to the destination.

This section defines the allowable values for the target attributes (from, to, and doc) of the xptr and xref elements:

An xptr or xref element with no attributes at all is, by definition, a link to the root (i.e. the document element --- by default, this is the TEI.2 element) of the document in which it appears.

The doc attribute value must be the name of an entity declared in the SGML document type declaration. If only the doc attribute is given a value, then by definition the destination is the entire entity named by the doc value. A more specific location within another entity must be specified with the from and the to attributes, as described below.

The from and the to attributes indicate the specific location pointed at, within the entity named by the doc attribute (or within the current document, if no doc value is given). Their values are referred to below as location pointer specifications. When both attributes are specified, the span pointed at by the element runs from the starting point of the span indicated by from to the ending point of the string specified by to. If the latter precedes the former in the document, then the pointer is in error and fails. If only the from attribute is specified, the to attribute defaults to the same value; the effect is that the element as a whole points to the span indicated by the from attribute. It is a semantic error to specify a value for to but not for from. Location Ladders

Each location pointer specification consists of a sequence of location terms, each of which consists of a keyword specifying a location type followed by one or more parenthesized parameter lists, each of which specifies a location value via a list of parameters. Location types and values, and the parameters within a location value, must be separated by white space characters.

Using terms borrowed from HyTime,

HyTime is an international standard (ISO 10744) built on SGML. It provides facilities for representing both static and dynamic information for processing and interchange by hypertext and multimedia applications. See ISO/IEC 10744 Information Technology - Hypermedia/Time-based Structuring Language (HyTime) (Geneva: International Organization for Standardization, 1992). For discussion of the relation between TEI proposals and Hytime, see chapter . we say that each TEI location term in a specification provides the location source for the next, and the entire specification is equivalent to a location ladder. By specifying the entire ladder in a single attribute value, the TEI extended pointer mechanism greatly reduces the syntactic and processing complexity of hypertextual pointers. In formal terms:The formal grammar specified in this section is reproduced as a unit in the appendix. The notation used for this formal grammar is that defined in chapter . ]]> Location Terms

The keywords used in location terms are these; references to the tree mean the tree representing the SGML document hierarchy. points at the root of the target document points at the location of the pointer points at an ID within the target document gives a canonical reference to a location in the target document indicates an element found by descending one level in the tree indicates an element found by descending one or more levels in the tree indicates an element found by ascending one or more levels in the tree indicates an element found by traversing the older siblings of the current location source indicates an element found by traversing the younger siblings of the current location source specifies a regular expression to be located within the existing location source points at one or more tokens in the character content of the location source points at one or more characters in the character content of the location source points at a location using coordinates in some (application-defined) n-dimensional space points at a location using some non-SGML method, and gives the name of the method points at a location using the HyQ query language defined by ISO 10744 (HyTime) (in the to attribute only) points at the same span as was indicated by the from attribute In formal terms: Note that the keywords, though shown here quoted in uppercase, are not case sensitive. ]]>

Each location term specifies a location in the target document; this location may be a single point, more often a span of text (often the span of a single element) within the target document. The location ladder as a whole is interpreted from left to right, and each location term specifies a location relative to the location specified by the sequence prior to that point (i.e. to its location source). Unless here or id is specified as the first location term, the beginning location source is always root. An empty location sequence thus is the same as root and specifies the entire destination entity.

In general, the search for the location specified by a location term will be conducted only within its location source (i.e. within the location already identified by preceding location terms). There are however several exceptions. The terms root, here, and id all ignore the location source defined by any preceding terms and therefore make sense only as the first items in the ladder. The term ancestor, next, and previous do not ignore the location source, but select a new span from the adjacent or enclosing portions of the text, and not from within the location source. Finally the location terms foreign, space, and HyQ are not defined fully here; they may or may not ignore the existing location source.

Some of the location terms make sense only in SGML documents; these are id, child, ancestor, descendant, previous, and next. The latter four involve traversing the tree representing the SGML document hierarchy and are most easily understood when their location source is a single SGML element. If the location source is not a single SGML element, the tree-traversal keywords operate upon its beginning end-point, its front end (in English, this will be the leftmost point of the location source; in Arabic or Hebrew it will be the rightmost point). In this case child and descendant have no meaning, since character data has no descendants in the document tree; the first ancestor of such a location source is the element immediately containing the character data in question, and the siblings referred to by next and previous are the other children of that immediately containing element.

The details of each keyword are given below, along with definitions of their syntax and semantics of their results. Examples are also provided. It is strongly recommended that when IDs are available, they should be used in preference to the other methods for pointing defined here.

For all keywords, the description assumes that the target document does in fact contain a span or element which matches the description; otherwise, the location term has no referent and is said to fail. If any location term fails, the entire pointer fails. No backtracking or retrying is performed (and indeed for the most part the location terms are defined as having only one matching location, so backtracking would in most cases lead to no better result). The ROOT Keyword

The location term root selects the root of the destination document tree; in SGML terms, this is the document element. Since it ignores any existing location source, the root keyword makes sense only as the first location term in the ladder. Since root is assumed as the implicit first term in any ladder, the following two location ladders have the same meaning: The HERE Keyword

The keyword here designates the location at which the pointer element itself is situated; it allows extended pointers to select items like the paragraph immediately preceding the one within which this pointer occurs. Since it ignores any existing location source, this keyword typically makes sense only as the first location term in a location specification.

To designate the paragraph preceding the current one, the following location ladder could be used: HERE ANCESTOR (1 P) PREVIOUS (1 P) (See below for descriptions of the keywords ancestor and previous.) The ID Keyword

The resulting location is the element within the destination entity whose ID attribute has the value specified as the location value. The ID location type typically makes sense only as the first location pair in a location specification, but there is no syntactic requirement that it be so.

For example,the location specification ID (a27) chooses the necessarily unique element of the destination entity which has an attribute of declared value of type ID, whose value is a27. The REF Keyword

The resulting location is an element which can be found by interpreting the location value in accordance with document-specific rules for a canonical reference. Such reference systems, particularly common in documents of interest to classical and biblical scholars, must also be defined in the TEI header, using the refsDecl element (see section ). If more than one element matches the canonical reference, the first one encountered is chosen.

For example, the location specification REF (MT.2.1) chooses the first element of the destination entity which is identified by the canonical reference MT.2.1 The CHILD Keyword

The child location type specifies an element or span of character data in the document hierarchy using a location value which functions as a domain-style address. The value is a series of parenthesized steps, separated by white space. Each such step represents one level of the hierarchy within the location source. Each step may contain up to four parameters separated by white space and interpreted as follows: a signed or unsigned instance number an expression matching an SGML generic identifier an expression matching an SGML attribute name an expression matching an SGML attribute value In formal terms, the location value of child is a series of steps: ]]>

Location values of the same form are also used by the keywords descendant, ancestor, previous, and next; details of the interpretation may vary from keyword to keyword.

If an instance number alone is specified, it selects the nth child of the location source. If specified with following parameters, it selects the nth among those children of the location source which satisfy the other parameters. If a negative number is given, the nth child is counted from the last child of the location source to the first. The location source must contain at least n children;Strictly speaking, |n| (absolute value of n) children. if it does not, the child term fails. In formal terms, the first parameter of a step is a signed integer: ]]>

If a second parameter is given, it is interpreted as an SGML generic identifier, and only elements of the type indicated will be selected. For example, the location specification CHILD (3 DIV1) (4 DIV2) (29 P) chooses the 29th paragraph of the fourth sub-division of the third major division of the initial location source. The location specification CHILD (3 DIV1) (4 DIV2) (-2 P) chooses the next-to-last paragraph of the fourth div2 of the third div1 in the location source.

Constraint by generic identifier is strongly recommended, because it makes links more perspicuous and more robust. It is perspicuous because humans typically refer to things by type: as the second section, the third paragraph, etc. It is robust because it increases the chance of detecting breakage if (due to document editing) the target originally pointed at no longer exists.

The generic identifier may be specified as a normal SGML name, as a (parenthesized) regular expression, or using the reserved values #CDATA or *. Regular expressions take the form described below; the location term CHILD (3 (DIV[123]) matches the third element which has a generic identifier of div1, div2, or div3. If the generic identifier is specified as *, any generic identifier is matched; this means that CHILD (2 *) is synonymous with CHILD (2). If the second parameter is #CDATA, the location term selects only untagged sub-portions of an element having SGML mixed content.

The location ladder CHILD (3 #CDATA) thus chooses the third span of character data directly contained by the current location source. If the location source is a paragraph containing a sentence (A) an embedded quotation, marked as a q another sentence (B) an embedded note, marked as a note another sentence (C) a second embedded quotation, marked as a q where the three sentences A, B, and C are character data enclosed by no element smaller than the paragraph itself, then CHILD (3 #PCDATA) selects sentence C, while CHILD (3) selects sentence B.

If specified as a name (i.e. without parentheses), the generic identifier is case sensitive if and only if the SGML declaration specifies that generic identifiers are case sensitive (by default they are not). If specified as a regular expression, the expression given is always case sensitive; in the usual case this means the regular expression should be in uppercase, as in the examples here. In formal terms the second parameter of a step is defined thus:If regular expressions need not be parenthesized, then this grammar and the accompanying text must be changed. Remove this note before publication.-MSM ]]>

If a third parameter is specified, it is interpreted as an attribute name, and only elements bearing that attribute and satisfying the other constraints will be selected. An element is held to bear an attribute if (a) the attribute is defined for that element type and (b) the value of the attribute is not IMPLIED. Like the preceding, this parameter may be specified as * in the (unlikely) event that an attribute value constitutes a constraint regardless of what attribute name it is a value for. The parameter may also be specified as a parenthesized regular expression.

For example, the location term CHILD (1 * TARGET) selects the first child of the location source for which the attribute target has a value. The location term CHILD (1 * (TARGET(S?)) will select the first child of the location source for which an attribute called either target or targets has a value.

As with generic identifiers, attribute names are case sensitive if and only if the SGML declaration says they are; regular expressions are always case sensitive and should usually be uppercased, as shown here. In formal terms, the third parameter of a tree-traversal step is defined thus: ]]>

If a fourth parameter is specified, it is interpreted as an attribute value, and only elements satisfying the other constraints and also bearing an attribute of the specified name and value will be selected. The attribute value may be specified exactly as in an SGML document; as a consequence, if the attribute value to be specified contains white space characters, it must be enclosed in quotation marks. The attribute value may also be specified as a regular expression, enclosed in parentheses.

For example, the location specification CHILD (1 * N 2) (1 * N 1) chooses an element using the global n attribute. Beginning at the location source, the first child (whatever kind of element it is) with an n attribute having the value 2 is chosen; then that element's first direct sub-element having the value 1 for the same attribute is chosen.

The location specification CHILD (1 FS RESP ((lanc|LANC)(s|S|ashire|ASHIRE))) selects the first child of the location source which is an fs element bearing a resp attribute with the value lancs, lancashire, LANCS, or LANCASHIRE (as well as other possible combinations which are left to the reader's ingenuity). If specified with quotation marks or as a regular expression, the attribute-value parameter is case-sensitive; otherwise not. In formal terms, the fourth parameter of a tree-traversal step is defined thus: ]]> The DESCENDANT Keyword

If the descendant keyword is used, the location term selects an element or character-data string which is a descendant of the current location source. Like child, descendant takes as a value a series of one or more parenthesized steps, which may contain the same four parameters described above. The set of elements and strings which may be selected, however, is the set of all descendants of the location source (i.e. the set of all elements contained by it), rather than only the set of immediate children.

The location specification ID (a23) DESCENDANT (2 TERM LANG DE) thus selects the second term element with a lang of de occurring within the element with an id of a23. The search for matching elements occurs in the same order as the SGML data stream; in terms of the document tree, this amounts to a depth-first left-to-right search.

If the instance number is negative, the search is a depth-first left-to-right search, in which the right-most, deepest matching element is numbered -1, etc. The location specification DESCENDANT (-1 NOTE) thus chooses the last note element in the document. The ANCESTOR Keyword

The ancestor location term selects an element from among the direct ancestors of the location source in the document hierarchy. The location value is of the same form as defined for the child and descendant location types. However, the ancestor keyword selects elements from the list of containing elements or ancestors of the location source, counting upwards from the parent of the location source (which is ancestor number 1) to the root of the document instance (which is ancestor number -1).

The location source must have at least as many ancestors as the absolute value of the instance number specified as the first parameter of the step. The ancestor type thus may not be specified as the first component of a location specification, because the initial location source in effect at that point is the root, which has no ancestors.

For example, the location term ANCESTOR (1 * N 1) (1 DIV) first chooses the smallest element properly containing the location source and having attribute n with value 1; and then the smallest div element properly containing it. The location term ANCESTOR (1) chooses the immediate parent of the location source, regardless of its type or attributes. The location term ANCESTOR (1 * LANG fr) selects the smallest ancestor for which the lang attribute has the value fr. The term ANCESTOR (-1 * LANG fr) selects the largest ancestor for which the lang attribute has the value fr. Finally, the term ANCESTOR (1 (DIV[0123456789]?)) chooses the smallest div element of any level which contains the location source. The PREVIOUS Keyword

The previous keyword selects an element or character-data string from among those which precede the location source within the same containing element. We speak of the elements and character-data strings contained by the same parent element as siblings; those which precede a given element or string in the document are its elder siblings; those which follow it are its younger siblings.

The instance number in the location value of a previous term designates the nth elder sibling of the location source, counting from most recent to less recent. The location ladder ID (a23) PREVIOUS (1) thus designates the element immediately preceding the element with an id of a23. Negative instance numbers also designate elder siblings, counting from the eldest sibling to the youngest. The location source must have at least as many elder siblings as the absolute value of the instance number. If the location source has at least one elder sibling, then the location term PREVIOUS (-1) designates its eldest sibling and is thus synonymous with the ladder ANCESTOR (1) CHILD (1) The NEXT Keyword

The keyword next behaves like previous, but selects from the younger siblings of the location source, not the elder siblings. The location ladder ID (a23) NEXT (1) thus designates the element or string immediately following the element which has an id of a23. Negative instance numbers also designate younger siblings, counting from the location source to the youngest sibling. The location source must have at least as many younger siblings as the absolute value of the instance number. If the location source has at least one younger sibling, then the location term NEXT (-1) designates its youngest sibling and is thus synonymous with the ladder ANCESTOR (1) CHILD (-1) The PATTERN Keyword

The pattern keyword selects the first place within the location source which matches a pattern-matching expression included as the location value. If more than one location matches that expression, there is no error, but the second and later matches are ignored.

Matching is defined to be case-sensitive, i.e. abc is not the same as ABC. The pattern is expressed as a regular expression in which the following characters have special meanings, similar to those of many Unix programs (such as grep) which handle regular expressions: match any single character (including white space characters). match any character from the set enclosed within the brackets. If, however, the first enclosed character is ˆ, then match any character not from the set enclosed within the brackets. For example, [ˆaeiou] would match any character except a, e, i, o, or u. If the next character is a, d, n, or s, the expression matches any character from a pre-defined group, as shown below; otherwise, the next character is to be taken literally, even if it would otherwise have a special meaning. The special character classes are: Note that although \n for newline is provided, its use is discouraged. match zero or more occurrences of the previous regular expression. match one or more occurrences of the preceding regular expression. match zero or one occurrences of the preceding regular expression. match the following regular expression only at the beginning of the location source. match the preceding regular expression only at the end of the location source. match either the regular expression on the left, or the one on the right. match the regular expression within the parentheses. (Parentheses are used to control application of the *, ?, +, and | operators, etc.)

For example, the location specification PATTERN (Chapter.8) chooses the first instance of the content string Chapter which is followed by any single character and then the digit 8, within the location source. Various elements which contain that location could be selected by following the pattern location term with one or more of other types such as ancestor (see above).

It is recommended practice to use structure-oriented location types to specify the destination element as narrowly as possible, and then to specify a pattern only within that element context. If element boundaries are encountered within the location source, however, they are ignored and have no effect on the pattern matching operation. In formal terms, the location value of the pattern keyword is defined thus: ]]> The TOKEN Keyword

The token keyword selects a sequence of one or more tokens chosen from within the character content of the location source, where tokens are counted exactly as for the corresponding HyTime tokenloc form. The location value must be either a single positive integer, or a pair of positive integers separated by white space, representing the first and the last token numbers to be included in the resulting location. If two integers are specified, the second must not be less than the first. The location source must contain at least as many tokens as are specified in the location value.

This location type should not be used to count across element boundaries. It is recommended practice to use structure-oriented location types to specify the destination element as narrowly as possible, and then to specify a token location only within that element context. If element boundaries are encountered within the location source, they are ignored.

This location type behaves intuitively only for strings containing an alternating sequence of SGML name-characters and white space; this is the type of string found, for example, in SGML attribute values of type IDREFS, such as a21 z a13. For compatibility with the HyTime standard, all characters not included in the class of name characters by the current SGML declaration (by default this includes all punctuation other than the hyphen and full stop) are treated as white space characters.

For example, the location specification ID (a27) TOKEN (3 5) chooses the 3rd, 4th, and 5th tokens from the content of the element whose identifier is a27. If this element contained the string This is _not_ a very good idea, the target selected would be not_ a very. In formal terms the location value of the token and str keywords is defined as a range: ]]> The STR Keyword

The str keyword identifies a sequence of one or more characters chosen from within the character content of the location source, where characters are counted exactly as for the HyTime strloc form, which has a corresponding meaning and usage. The location value must be either a single positive integer, or a pair of positive integers separated by white space, indicating the first and the last characters to be included in the resulting location. If two integers are specified, the second must not be less than the first. The location source must have at least as many characters as are specified in the larger of the integers.

This location type should not be used to count across element boundaries. The recommended practice is to use structure-oriented location types to specify the destination element, and then to specify a character location only within that element context. If element boundaries are encountered, however, within the location source, they have no effect.

Character offsets in an SGML document must be counted not from the original source file, but from the output of the SGML parser, (the element structure information set or ESIS). This is because the rules of SGML allow certain characters to be deleted or expanded transparently.

For example, the location specification ID (a27) STRLOC (3 5) chooses the 3rd 4th and 5th characters of the content of the element having identifier a27. If this element contained the string This is an even worse idea, the result would be the string is (i, s and a space).

In multi-byte character sets it is characters which are counted, not bytes. However, in the case of diacritics coded by sequences of bit combinations rather than having separate code points for every combination of letter and diacritic, the diacritics are counted. This means that the following location ladder may retrieve different strings, depending on the system character set in use and on the entity declarations in effect: In some character sets, where ö and ä are encoded as single characters, it will select the string Götterdämmerung; in others, where they are encoded with distinct characters for umlaut, a, and o, it will select the string Götterdämmeru, truncating the last two letters. If a system-dependent definition is used (containing e.g. a printer escape sequence), the results are even less predictable. For this reason, the str keyword must be used with caution and should be avoided where possible. The SPACE Keyword

The space location term applies to entities which represent graphical or spatio-temporal data; typically such entities are not encoded in SGML, but in one of many specialized graphical formats. SGML provides standard mechanisms (the NOTATION declaration and related constructs) for specifying what format such an entity uses.

The location value for space consists of two or three parenthesized parameter lists. The first contains the name of the co-ordinate space in use. The second and third each consist of any number of signed integers. The numbers in a parameter list represent locations along each dimension of a Cartesian co-ordinate space with all axes orthogonal; the length of the list equals the number of dimensions/axes of the space (usually, but not inevitably, 2, 3, or 4).

If the third parameter list is not specified, the location is the single point in the co-ordinate space specified by the second parameter list. If all three parameter lists are specified, the location is the rectangular prism defined by treating corresponding items of the second and third lists as inclusive bounds along each dimension in turn.

The mapping from co-ordinates to physical or display space, and the meaning and ordering of the axes, are not defined by these guidelines. They should be specified in the TEI header unless they can be determined by definition from the format in which the referenced entity is known to be encoded (for example, many graphics formats can only encode locations in units of pixels, counted in a 3 dimensional left-handed co-ordinate space).

Time may be construed as an axis in addition to any others; when it is, it is TEI recommended practice that it be positioned last. The units used must be defined in the TEI header; it is acceptable in certain media (such as videodiscs) to use frame numbers as a surrogate axis for time.

For example, SPACE (2D) (0 0) (1 1) specifies the location of the unit square tangent to the origin in quadrant 1 of a common graph. The location value for a space location term is a NAME enclosed in parentheses, followed by a point pair: ]]> The FOREIGN Keyword

The foreign keyword takes any number of parenthesized parameter lists, and is terminated by the end of the attribute value, or by the next non-parenthesized token, whichever comes first.

The meaning of the foreign location term is not defined by these Guidelines. It is intended for use in pointing to special kinds of non-SGML, non-coordinate space data. That is, it should be used for making links to data which cannot be specified using the other mechanisms. The meaning of any foreign location types must be specified in the TEI header, in the encodingDesc element. If more than one such type is used, it is TEI recommended practice that the first parameter list to foreign be a name associated with the particular type by documentation in the TEI header.

For example, assume that some program uses a proprietary data format called XFORM, and that the program has supplied an identifier 06286208998 for some piece of data it owns. Then the location specification FOREIGN (XFORM) (06286208998) would be one way of expressing a link to that piece of data. The HYQ Keyword

The HyQ keyword takes a single parenthesized parameter lists, which contains an expression in the HyQ query language defined by the HyTime standard. See documentation on HyTime and HyQ for definitions of HyQ expressions. The DITTO Keyword

The ditto keyword is valid only as the first location term in a ladder, and only within the to attribute of an extended pointer element. It designates the location result of the from attribute on the same element. Thus in the pointer ]]> the from attribute designates the first occurrence of the string Wagnerian in the div containing the element with an id of a23. The to attribute designates the first occurrence of the string Liebestod which occurs after Wagnerian, within the same div. Without the ditto keyword, it would be necessary to repeat the entire location ladder of the from attribute in the to attribute, which would be error-prone for complex expressions. Using Extended Pointers

As noted above, when only the from attribute is specified, the xref or xptr element points at the span indicated by from. When both from and to are specified, the element points at the span running from the beginning of the span indicated by the former to the end of the span indicated by the latter. To point at the second, third, and fourth paragraphs of the second chapter (div1) in the body of the current document, therefore, one may specify either of the following: ]]>

To point to the term occurring in the current termEntry with attribute n = 2, only the from attribute would be required: ]]>

The following example demonstrates how elements from two different documents may be combined ]]> The first xptr indicates the element in doc1 which has identifier d1.1. The second indicates the second subelement of the element in doc2 which has identifier d2.1. These two elements are pointed to as a single item by the ptr element and given the identifier p1. This aggregation, finally, is linked with two other elements both in the current document, with identifiers s1 and s2.

An extended pointer, as described above, may specify as its target only a single destination. Where the intended destination of a link is an aggregation or alignment of destinations, possibly in separate documents, a ptr or join element must be used to combine them into a single pointer, as described elsewhere in this chapter. Like any other element, an xref and xptr may be given a unique id within the document that contains them. This id value can then be supplied as one of the target values for a multi-headed ptr or link element, to represent aggregation or linkage respectively.

For example, a modern commentary on an older text must frequently refer to that text, which might well be encoded in a separate SGML document. Some discussions will refer to set of discrete passages in the older text, and will thus require multi-headed pointers. In such a case, the document type declaration must contain a declaration for an SGML entity containing the older text, which might look something like this: ]]> In the commentary itself, reference will be made to this external document, using xptr and xref elements. When the commentary refers to aggregates of discontiguous passages, xptr elements are used to point to the individual passage, and a ref element may refer to these passages as a group by pointing to the xptrs: ...

In the references to Theobald, Pope's satire characteristically ...

]]> If the same discontiguous target is to be referred to repeatedly, it may be convenient to give it a single identifier, thus: ...

In the references to Theobald, Pope's satire characteristically ...

]]>

A hypertext web might associate passages of the text and notes with the individuals mentioned, the ancient authors imitated, or thematic content, thus: ...

Individuals Named in the Text A bookseller and publisher ... ... Attorney, active also as editor and reviewer ... ... ...
Ancient Authors Imitated in the Text Virgil Homer Ovid ... ... ... ... ]]> Correspondence

A common problem of text encoding is that of indicating that two or more passages correspond to each other in some way. Provided that corresponding elements bear an id attribute, their alignment may be expressed using either the corresp attribute described at the beginning of this chapter, or the corresp and correspGrp elements: groups a number of corresp elements which have common properties or function. Attributes include: indicates how the elements are aligned. Sample values include: The beginnings of the elements are aligned. The ends of the elements are aligned. The middles of the elements are aligned. The whole elements are aligned. The aligned elements overlap. represents an alignment or correspondence among a group of elements or passages. Attributes include: specifies the elements or passages to be aligned, by giving a set of ID values associated with them. indicates how the passages are aligned. Sample values include: The beginnings of the passages are aligned. The ends of the passages are aligned. The middles of the passages are aligned. The whole passages are aligned. The aligned passages overlap. Like other pointing elements, these elements are members of the class pointer and share the following additional attributes: categorizes the pointer in some respect, using any convenient set of categories. specifies the creator of the pointer. specifies when the pointer was created. specifies the kinds of elements to which this pointer may point. where more than one identifier is supplied as the value of the target attribute, this attribute specifies whether the order in which they are supplied is significant. Sample values include: Yes: the order in which IDREFs are specified as the value of a target attribute should be followed when combining the targeted elements. No: the order in which IDREFs are specified as the value of a target attribute has no significance when combining the targeted elements. Unspecified: the order in which IDREFs are specified as the value of a target attribute may or may not be significant. specifies the intended meaning when the target of a pointer is itself a pointer. Sample values include: if the element pointed to is itself a pointer, then the target of that pointer will be taken, and so on, until an element is found which is not a pointer. if the element pointed to is itself a pointer, then its target (whether a pointer or not) is taken as the target of this pointer. no further evaluation of targets is carried out beyond that needed to find the element specified in the pointer's target.

The corresp attribute may be specified for any element, provided that the analytic attribute set has been enabled, as described in section . It takes as its value a string of SGML identifiers, indicating the element or elements which are regarded as corresponding in some sense. As a simple example of its use, consider the following sentence: Shirley, which made its Friday night debut only a month ago, was not listed on NBC's new schedule, although the network says the show still is being considered. ]]> Here the anaphoric phrases the network and the show have been associated directly with the elements to which they refer by means of a corresp attribute. This mechanism is simple to apply, but has two drawbacks: it is to a large extent arbitrary as to which of the associated elements should bear the corresp attribute, and it is not possible to specify more exactly what kind of correspondence is intended. Where this attribute is used, therefore, encoders are encouraged to specify their intent in the associated encoding declarations in the TEI Header.

These drawbacks do not apply to the empty corresp element, and its associated grouping element correspGrp. The same example could be encoded using the corresp element as follows: Shirley which made its Friday night debut only a month ago was not listed on NBC's new schedule, although the network says the show still is being considered. ... ]]>

We consider correspondence to be an equivalence relation, so that all of the following encodings, and some others as well, are semantically equivalent. The first four of these are redundancy free; the fifth one is maximally redundant. This example shows even more clearly the indeterminate placement of the corresp attribute. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ]]>

The redundancy and indeterminacy allowed by the corresp attribute are eliminated by the corresp element. All of the hypothetical encodings given above using the corresp attribute are equivalent to the following single encoding using the corresp and correspGrp elements. ... ... ... ]]>

The corresp attribute also has the drawback of providing no method of indicating whether the correspondence indicated is true of the whole of the elements being aligned, their starting points, or their ending points. By contrast, the corresp element has a how attribute to specify whether the beginnings of the affected elements are aligned, the ends, the middles, or the whole elements; or whether the alignments merely overlap in some unspecified way.

The element correspGrp may be used to group together a number of corresp elements having common values for their type or how values, to avoid the need to specify them repeatedly.

Here are the formal declarations of the correspGrp and corresp elements. ]]> A Detailed Example

The corresp element may be used to align alternative or parallel versions or translations of the same underlying material. We illustrate by considering three English versions of St. John's Gospel: those in The Holy Bible: Revised Standard Version (RSV), in J.B. Phillips, The New Testament in Modern English (PNT), and in The New English Bible (NEB).

We assume that the text of RSV is broken down into div0 segments containing the Old and New Testaments, div1 segments containing the various books of the Bible, and div2 segments containing the chapters of those books. We assume that s tags mark verses and orthographic sentences, that all elements other than headings are provided with id attributes, and that chapter and verse numbers are recorded as values of the n attribute on appropriate elements. Our markup of this version of St. John's Gospel appears schematically as follows (white space has been added liberally to aid legibility). The New Testament The Gospel According to John

In the beginning was the Word, and the Word was with God, and the Word was God. He was in the beginning with God: all things were made through him, and without him was not anything made that was made. ]]>

PNT is divided into four parts; we assume that these are encoded as div0 segments. These are made up of one or more books, which we represent as div1 segments. Books are divided into sections, for which PNT provides descriptive heads. Chapter and verse numbers are recorded at the beginnings of those sections; these are the only chapter and verse numbers indicated in PNT. We represent sections as div2 segments, and assign the chapter and verse numbers as the value of the n attribute of those elements. Sections are made up of paragraphs, which are in turn made up of orthographic sentences. Our schematic markup of this version of St. John's Gospel is as follows. The Gospels The Gospel of John Prologue

At the beginning God expressed himself. That personal expression, that word, was with God and was God, and he existed with God from the beginning. All creation took place through him, and none took place without him. ]]>

Finally, NEB is divided into four parts corresponding exactly to those in PNT; we suppose that these are also encoded as div0 segments. They are also made up of one or more books, which which correspond to those in PNT, and we represent them also as div1 segments. Again, like PNT, books are divided into headed sections, which we represent as div2 segments. However, these are further divided into subsections (without heads), which we represent as div3 segments; these, in turn, are divided into paragraphs and orthographic sentences. Chapter and verse numbers are recorded in the margin; we indicate these as values of the n attributes on selected elements. Our schematic markup of this version of St. John's Gospel is as follows. The Gospel The Gospel According to John The Coming of Christ

When all things began, the Word already was. The Word dwelt with God, and what God was, the Word was. The Word, then, was with God at the beginning, and through him all things came to be; no single thing was created without him. ]]>

Although the identity of the n values specified here indicates (implicitly) the correspondences between these three versions, it suffers from the twin defects that it cannot be validated or processed by an SGML processor and that it is restricted in its granularity. Although all the elements for which the n attribute has the value c1v1 begin at the same relative points in their respective versions, they all end in different places.

To represent correspondences among these three versions explicitly, the analytic corresp attribute may be used. For example, to show that the div1 element with identifier JN corresponds with both that with identifier pJN tag and that with identifier nJN, its start-tag can be rewritten as ]]>

In the same way, to show the relation between the first segment in the RSV fragment and corresponding segments in the PNT and NEB fragments, we might rewrite the tag for the RSV segment as follows: ]]>

However, the correspondence indicated by this encoding differs slightly from the others in that it is only at their starting points that the three segments are properly aligned. The content of the element with identifier JN.1 in fact corresponds not only to the elements indicated, but to one additional elements in each of the other two versions (in PNT to pjn.1.1.2, in NEB to njn1.1.1.2). As noted above, the corresp attribute provides no simple way of indicating whether the correspondence it indicates is true of the whole of the elements being aligned, their starts, or their ends. Thus, the two alignments discussed above may be encoded as follows, to indicate more precisely exactly what aligns: ]]>

These alignments, however, are far from capturing the parallels among the three texts. Because of the structural differences among the translations, there are only a few low-level structural units which correspond as wholes to each other: PNT PJN1.1.1.1 At the beginning, God expressed himself corresponds to NEB NJN1.1.1.1 When all things began, the Word already was. RSV JN1.1 In the beginning ... was God corresponds to NEB NJN1.1.1.1 and NJN1.1.1.2 When all things began ... the Word was. RSV JN1.2 and JN1.3 He was ... was made correspond to NEB NJN1.1.1.3 The Word, then, ... without him. These may be recorded using corresp thus; note that the RSV / NEB alignments requires two elements of NEB to be joined into a virtual unit; this can be done with ptr, as here, or with join, as described in section . ]]>

To capture all the overlaps of the three versions, the correspGrp element can set how to overlap: ]]> Alignment Using External Pointers

The preceding encoding of the alignment of parallel passages from three texts requires that those texts and the alignment all be part of the same SGML document. If the texts are in separate documents, then additional xptr elements must be supplied, as discussed in section . These external pointers may appear anywhere within the document, but if they are created solely for use in encoding correspondences, they may for convenience be grouped within the correspGrp element that uses them.

To demonstrate this facility, we consider how we might encode the alignments in an extract from Comenius' Orbis Sensualium Pictus. Each topic covered in this work has three parts: a picture, a prose text in Latin describing the topic, and a carefully-aligned translation of the Latin into English, German or some other vernacular. Key terms in the two texts are typographically distinct, and are linked to the picture by numbers, which appear in the two texts and within the picture as well.

Our example uses the English translation of Charles Hoole (1659), and is taken from John E. Sadler, ed., John Amos Comenius Orbis Pictus: a facsimile of the first English edition of 1659 (Oxford: Oxford University Press, 1968) (The Juvenile Library).

First, we present the text portions. The English and Latin portions have been encoded as distinct div elements. Identifiers have been attached to each typographic line, but no other encoding added, to simplify the example.

The Study The Study is a place where a Student, a part from men, sitteth alone, addicted to his Studies, whilst he readeth Books, ...
Muséum Museum est locus ubi Studiosus, secretus ab hominibus, solus sedet, Studiis deditus, dum lectitat Libros, ...
]]>

Next we assume that we have stored a digitized image of the picture itself in some external entity we will call com98 (for further discussion of the handling of external images and graphics, see section ). We further assume that we can address portions of this image as a two-dimensional co-ordinate space. The SPACE location method of the xptr element (discussed in section above) can now be used to point to the whole picture and to two portions of it, one containing the picture of a student and the other of a book, as follows: ]]> Note that each external pointer has its own unique identifier, in addition to the n attribute, which last holds the visible label (or explainer) used for this image portion in the original.

As printed, the text exhibits three kinds of alignment. The English and Latin portions are printed in two parallel columns, with corresponding phrases, (represented above by s elements), more or less next to each other. Particular words or phrases are marked as terms in the two languages by a change of rendition: the English text, which otherwise uses black letter type throughout, has the words The Study, a Student Studies and Books in a roman font; in the Latin text, which is printed in roman, the corresponding words (Museum, Studiosus, Studiis and Libros) are all in italic. Numbered labels appear within the text portions, linking keywords to each other and to sections of the picture. These labels, which have been left out of the above encoding, are attached to the first third and last segment in each language quoted below, and also appear (rather indistinctly) within the picture itself. If it is desired to transcribe them in the text, they might be encoded using as ref elements, anchor elements, or xptrs to the picture; the number itself would be transcribed as the value of the n attribute (or as the content of the ref).

The first kind of alignment might be represented by using the corresp attribute on the s element. The second kind might be represented by using the gloss and term mechanism described in section . The third kind of alignment might be represented using pointers embedded within the texts, although this would involve some duplication. We choose however to use the corresp element, since this provides an efficient way of representing the three-way alignment between English, Latin and picture without redundancy. ]]>

This map, of course, only aligns whole segments and image portions, since these are the only parts of our encoding which bear identifiers and can therefore be pointed to. To add to it the correspondence between the typographically distinct words mentioned above, new elements must be defined, either within the text itself or externally by using the extended pointer mechanism. Encoding these word pairs as term and gloss, although intuitively obvious, requires a non-trivial decision as to whether the Latin text is glossing the English, or vice-versa. Tagging all the marked words as term avoids the difficult decision, but might be thought by some encoders to convey the wrong information about the words in question. Simply tagging them as additional embedded s elements with identifiers that can be aligned like the others is also a possibility. All of these require the addition of further markup to the text. This may pose no problems, or it may be infeasible (e.g. if the text is held on a read-only medium). If it is not feasible to add more markup to the original text, the extended pointer mechanism is likely to be the best choice. For example, to indicate that the words Studies and Studiis correspond, two external pointers might be defined and aligned as follows: ]]>

With the extended-pointer methods just demonstrated, we can provide an even more precise clause-by-clause alignment of the example in section . We can use xptr elements to identify the relevant portions of each s, thus providing SGML identifiers for the fine-grained parallels. In the following example, we assume that the three versions are transcribed in documents of with system identifiers 'rsv.tei', 'pnt.tei', and 'neb.tei'. The document type declaration should declare entities with these system identifiers: ]]>

Next, we declare extended pointers to the sub-parts of the structural units of the text; the text of RSV is given in comments, to aid the reader in following the example. ]]>

With the pointers defined, we are in a position to record the correspondences at a fairly fine granularity: ]]> Further Example

As a final example, here is an alignment by wholes of three English versions of the same biblical passage (John 4:6-8.). First the RSV fragment: Jacob's well was there, and so Jesus, wearied as he was with his journey, sat down beside the well. It was about the sixth hour.

There came a woman of Samar&sm;ia to draw water. Jesus said to her, Give me a drink. For his disciples had gone away into the city to buy food. ]]> Next, the corresponding fragment from PNT: Jesus, tired with the journey, sat down beside it, just as he was. The time was about midday. Presently a Samaritan woman arrived to draw some water.

Please give me a drink, Jesus said to her, for his disciples had gone away to the town to buy food. ]]> The corresponding fragment from NEB: It was about noon, and Jesus, tired after his journey, sat down by the well.

The disciples had gone away to the town to buy food. Meanwhile a Samaritan woman came to draw water. Jesus said to her, Give me a drink. ]]> And finally, the list of correspondences: ]]>

If, as is probably the case, the three separate versions are kept in three distinct SGML documents, the alignment cannot be done directly in terms of the identifiers used in each document, since these may not be unique. A set of xptr elements may be used to address the required elements indirectly as follows, again embedded within the correspGrp element for convenience: ]]> Aggregation and Virtual Elements

Because of the strict hierarchical organization of an SGML document, it is not always possible to enclose all and only all of the parts of a fragmented text segment within a single element. In section we introduced the notion of the multi- headed pointer as a general purpose method of pointing to discontinuous segments of this kind. In this section we discuss two methods of representing specific kinds of aggregation, using either a set of special-purpose analytic attributes (enabled as described in section ), or the special purpose join element. Both mechanisms require that the elements to be aggregated all bear an id attribute, or can be accessed indirectly using the extended pointer mechanism described in section .

The analytic attributes used for aggregation are next, prev, link, and partof: specifies the identifier of an element which may be linked with the current element to form an aggregate, and which immediately follows it in the aggregate. specifies the identifier of an element which may be linked with the current element to form an aggregate, and which immediately precedes it in the aggregate. specifies the identifiers of elements that may be linked with this one to form an aggregate. specifies the identifier of a join element in the current document representing an aggregate of which this element is a component part.

The join element is also used to specify aggregates; it can of course carry more information than can the attributes just named. identifies a possibly fragmented segment. Attributes include: specifies the SGML identifiers of the elements or passages to be aggregated. specifies the kind of element which this aggregation may be understood to represent.

As a simple example of the use of these mechanisms, consider the following short passage: Owl, said Rabbit shortly, You and I have brains. The others have fluff. If there is any thinking to be done in this forest &dash and when I say thinking I mean thinking ‐ you and I must do it. ]]> This example exhibits at least two discontinuities. The two halves of Rabbit's speech are interrupted by the reporting phrase (said Rabbit shortly) and his final sentence contains an embedded sentence. The simplest method of resolving the first of these is to use the link attribute as follows: Owl, said Rabbit shortly, You and I have brains. The others have fluff. .... ]]> Alternatively, to avoid the ambiguity as to whether the first half of the speech should be linked to the first or the second to the first, we might use the paired next and prev attributes, as follows: Owl, said Rabbit shortly, You and I have brains. The others have fluff. ... ]]> The same mechanism could be used to indicate a simple segmentation of the rest of the speech: If there is any thinking to be done in this forest &dash and when I say thinking I mean thinking you and I must do it. ]]>

A second method of indicating that the s elements with identifiers s1 and s3 in the above example may be regarded as forming a single virtual element would be to use the join element as follows: If there is any thinking to be done in this forest &dash and when I say thinking I mean thinking you and I must do it. ... ]]> The same mechanism may, mutatis mutandis, be used to indicate that the two q elements in the first example above may be regarded as forming a single virtual q.

Here is the formal declaration of the join element. ]]> Extended Example

We now present a more detailed example of the use of these mechanisms to represent an analysis of a passage from J.D. Salinger, Franny and Zooey, in which fragments of the two voices of the character Zui-Gan are aggregated in a number of different ways: Zui-Gan called out to himself every day, Master.

Then he answered himself, Yes, sir.

And then he added, Become sober.

And after that, he continued, do not be deceived by others.

Yes, sir; yes, sir, he replied.

]]> The link attribute can also be used for this purpose; for example, the beginning of the passage can be marked up as follows. Zui-Gan called out to himself every day, Master.

Then he answered himself, Yes, sir.

]]>

To create a virtual element that unites the fragmented segments, the join tag may be used. This element requires a parts attribute, which points to the elements which make up such a segment, and may also have a type attribute, whose value is the name (in SGML terms, the generic identifier) of the element which the join can be construed to represent. The value of this attribute may be considered to represent the type of element which the join would be if it could be encoded directly.

The next example is a re-analysis, using the join tag, of the preceding markups of the two sets of disconnected quoted elements. Zui-Gan called out to himself every day, Master.

Then he answered himself, Yes, sir.

And then he added, Become sober.

And after that, he continued, do not be deceived by others.

Yes, sir; yes, sir, he replied.

]]>

The markup in the preceding example sets pointers from the join tags to the various q tags which are its parts. One may also set pointers from the individual fragments to the join tags by means of the partof attribute, thus providing for two-way pointing, as follows. Zui-Gan called out to himself every day, Master.

Then he answered himself, Yes, sir.

And then he added, Become sober.

And after that, he continued, do not be deceived by others.

Yes, sir; yes, sir, he replied.

]]>

Next, suppose that id attributes, for whatever reasons, are not provided for text elements. Then xptr elements may be created using any of the methods described in section . These tags, in turn, may be provided with id attributes which the parts attribute on the join elements can point to, as in the following example. Zui-Gan called out to himself every day, Master.

Then he answered himself, Yes, sir.

And then he added, Become sober.

And after that, he continued, do not be deceived by others.

Yes, sir; yes, sir, he replied. ]]> The join and xptr elements need not, of course, be in the same SGML document as the text; indeed, if for example, the text is held on a read-only medium this may not be possible. In such a case, they may be stored into some other document, and point into the document containing the text simply by specifying its entity name as the value of the doc attribute on each of the xptr elements: ... ... ]]>

The next example is a possible reconstruction of the fragment of the diary that the character Winston Smith is writing throughout the first chapter of Nineteen-Eighty-Four, by George Orwell. one

It was a bright cold day in April, and the clocks were striking thirteen. In small clumsy letters he wrote: April 4th, 1984.

He sat back. A sense of complete helplessness had descended upon him.

Suddenly he began writing in sheer panic, only imperfectly aware of what he was setting down. His small but childish handwriting straggled up and down the page, shedding first its capital letters and finally even its full stops: April 4th, 1984. Last night to the flicks. then there was a wonderful shot of a childs arm going up up up right up into the air typical prole reaction they never

Winston stopped writing, partly because he was suffering from cramp.

For a moment he was seized by a kind of hysteria. He began writing in a hurried untidy scrawl: theyll shoot me i dont care theyll shoot me in the back of the neck i dont care down with big brother they always shoot you in the back of the neck i dont care down with big brother

He sat back in his chair, slightly ashamed of himself, and laid down the pen. With the feeling that he was speaking to O'Brien, and also that he was setting forth an important axiom, he wrote: Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. viii two ]]>