Feature System Declaration

The Feature System Declaration (FSD) is an auxiliary file used in conjunction with a TEI-conforming text that makes use of fs (that is, feature structure) elements. The FSD serves three purposes: It provides a mechanism by which the encoder can list all of the feature names and feature values and give a prose description as to what each represents. It provides a mechanism by which the encoder can define constraints on what it means to be a well-formed feature structure. These constraints may involve constraints on the range of a feature value, constraints on what features are valid within certain types of feature structures, or constraints that prevent the co-occurrence of certain feature-value pairs. It provides a mechanism by which the encoder can define the intended interpretation of underspecified feature structures. This involves defining default values (whether literal or computed) for missing features.

As a component of the interchange standard for encoded text, the FSD serves an important function in documenting precisely what the encoder intended by the system of feature structure markup used in the encoded text. As application software is developed which makes use of encoded texts, the FSD will also become an important resource that will allow software to validate the feature structure markup in a text and to infer the full interpretation of underspecified feature structures.

This chapter begins by describing how the encoded text uses header information to make links to any associated FSDs. The second through fourth sections describe the overall structure of an FSD and give details of how to encode its parts. The final section offers a full example. For a fuller discussion of the reasoning behind FSDs and for another complete example, see A rationale for the TEI recommendations for feature-structure markup, by D. Terence Langendoen and Gary F. Simons (to appear in the special TEI issue of Computers and the Humanities). Linking a TEI Text to Feature System Declarations

In order for application software to use feature system declarations to aid in the automatic interpretation of encoded texts, or even for human readers to find the appropriate declarations which document the feature system used in markup, there must be a formal link from the encoded texts to the declarations. As it turns out, the mechanism for linking texts to FSDs parallels the mechanism for linking texts to writing system declarations (WSDs).

The linkage is made in two places. First, in the DTD subset of the document type declaration at the beginning of the text file, an external entity is defined for each FSD that is associated with the encoded text. That entity declaration associates an entity name with the name of a file on the host system. It appends the SUBDOC keyword to tell the processor that the named file is a self-contained SGML document. See the example below for details of syntax.

The second place in which the linkage from text to FSDs is made is in the TEI header, as mentioned in section . Within the encodingDesc element, a special fsdDecl element may be used for each distinct feature structure type, as follows: identifies the feature system declaration which contains definitions for a particular type of feature structure. Attributes include: identifies the type of feature structure documented in the FSD; this will be the value of the type attribute on at least one feature structure. specifies the external entity containing the feature system declaration; an entity declaration in the document's DTD subset must associate the entity name with a file on the system. Note that one fsdDecl element must be specified for each distinct type of feature structure used in the markup. The fsd element supplies the name of the external entity containing the actual declaration for that type of feature structure.

There may be multiple fsdDecl elements for a given FSD; one for each type of feature structure it defines. For instance, in the following example, the file lex.fsd contains an FSD that contains definitions of feature structures for both lexical entries (fs type=entry) and lexical subentries (fs type=subentry).

The following example shows the markup for linking a TEI document to two WSDs and two FSDs. The linkage to both WSDs and FSDs is shown in order to illustrate the parallel nature of the linking mechanisms for both kinds of auxiliary files. ]> ... English French ]]>

The auxiliary tag set for feature system declarations is contained in the file teifsd2.dtd, which has the overall structure shown below: %TEI.elementNames; %TEI.keywords.ent; %TEI.elementClasses; %TEI.fs.dtd; %TEI.header.dtd; %TEI.core.dtd; ]]> The Overall Structure of a Feature System Declaration

A feature system declaration is encoded as a document of type teiFsd2. It has two parts: an obligatory header (which provides bibliographic information for the file) and a set of feature structure declarations (each of which defines one type of feature structure). Each feature structure declaration in turn has three parts: an optional description (which gives a prose comment on what that type of feature structure encodes), an obligatory set of feature declarations (which specify range constraints and default values for the features in that type of structure), and optional feature structure constraints (which specify co-occurrence restrictions on feature values). The header is encoded as a teiHeader, just as for any TEI.2 document; see chapter . The other components listed above are unique to feature system declarations. Thus, the following new elements are involved: contains a feature system declaration. declares one type of feature structure. Attributes include: gives a name for the type of feature structure being declared. gives the name of the feature structure type from which this type inherits features and constraints; if this type declares a feature with the same name as a feature of the base type, the definition within this fsDecl overrides the inherited definition. The fsConstraints are inherited only if this fsDecl does not specify any; otherwise the constraints in this fsDecl override. When no baseType is specified, no features or constraints are inherited. describes in prose what is represented by the type of feature structure declared in the enclosing fsDecl. declares a single feature, specifying its name, organization, range of allowed values, and optionally its default value. specifies constraints on the content of well formed feature structures.

Feature declarations and feature structure constraints are described in the next two sections of this chapter. Note that the specification of similar fsDecl elements can be simplified by devising an inheritance hierarchy for the feature structure types. Each fsDecl may name a baseType from which it inherits feature declarations and constraints. For instance, suppose that fsDecl type=Basic contains fDecl name=One and fDecl name=Two, and that fsDecl type=Derived baseType=Basic contains just fDecl name=Three. Then any instance of fs type=Derived may include all three features. This is because fsDecl type=Derived inherits the two feature declarations from fsDecl type=Basic when it specifies a baseType of Basic.

The following sample shows the overall structure of a complete FSD. Note that as a stand-alone document it begins with a DOCTYPE declaration which identifies the associated DTD. Describes what this type of fs represents ]]>

The formal definition of teiFsd2 and feature structure declarations is as follows: ]]> Feature Declarations

Each feature is declared in an fDecl element whose name attribute identifies the feature being declared; this matches the name attribute of the f elements it declares. An fDecl also has an org attribute which declares the organizing principle for the values of the f elements it declares. That is, the value may be a unit (a single value), a set (in which the order is not significant and there are no duplicates), a bag (in which the order is not significant but duplicates are allowed), or a list (in which the order is significant). (See definition of org attribute of f in section .) An fDecl has three parts: an optional prose description (which should explain what the feature and its values represent), an obligatory range specification (which declares what values the feature is allowed to have), and an optional default specification (which declares what default value should be supplied when the named feature does not appear in an fs). A single unconditional default value may be specified, or multiple conditional values. If no default is specified, or if none of the conditions is met, then the default value is none; in other words, the feature is not applicable (see section for a discussion of the none element).

The tags used in feature declarations are the following: declares a single feature, specifying its name, organization, range of allowed values, and optionally its default value. Attributes include: indicates the name of the feature being declared; matches the name attribute of f elements in the text. specifies the organizing discipline of the feature value. Legal values are: unitary atomic value set value (unordered, no duplicates) bag value (unordered, may have duplicates) list value (ordered, may have duplicates) describes in prose what is represented by the feature being declared and its values. defines the range of allowed values for a feature, in the form of an fs, vAlt, or primitive value; for the value of an f to be valid, it must be subsumed by the specified range; if the f contains multiple values (as sanctioned by the org attribute), then each value must be subsumed by the vRange. declares the default value to be supplied when a feature structure does not contain an instance of f for this name; if unconditional, it is specified as one (or, depending on the value of the org attribute of the enclosing fDecl) more fs elements or primitive values; if conditional, it is specified as one or more if elements; if no default is specified, or no condition matches, the value none is assumed. defines a conditional default value for a feature; the condition is specified as a feature structure, and is met if it subsumes the feature structure in the text for which a default value is sought. separates the condition from the default in an if, or the antecedent and the consequent in a cond element.

The logic for validating feature values and for matching the conditions for supplying default values is based on the operation of subsumption. Subsumption is a standard operation in feature-structure-based formalisms. Informally, a feature structure fs subsumes all feature structures that are at least as informative as itself; that is, all feature structures that specify at least as many features as fs with values at least as informative as those given in fs (Pereira 1987:6; see also Shieber 1986:14-16). Fernando C. N. Pereira, Grammars and logics of partial information, SRI International Technical Note 420 (Menlo Park, CA: SRI International, 1987), and Stuart Shieber, An Introduction to Unification-based Approaches to Grammar, CSLI Lecture Notes 4 (Palo Alto, CA: Center for the Study of Language and Information, 1986). A more formal definition requires that we first define the notion of domain of a feature structure. A feature structure can be viewed as a partial function that maps features onto values; when viewed in this way, the domain of a feature structure is the set of top-level features it contains (that is, excluding features in embedded feature structures). We can now offer a more precise definition: fs subsumes fs′ if both are identical primitive values, or if the domain of fs is a subset of the domain of fs′, and for every feature f in the domain of fs, the value of f in fs subsumes the value of f in fs'.

Following the spirit of the informal definition above, we can extend subsumption in a straightforward way to cover alternation, negation, special primitive values, and the use of attributes in the SGML markup. For instance, a vAlt containing the value v subsumes v. The negation REL=ne of value v subsumes any value that is not v. The value unknown subsumes any value. The value any subsumes any value that is in the range of a feature. fs type=X/fs subsumes any feature structure with TYPE=X. nbr rel=ge value=0 subsumes any nbr with value greater than or equal to zero.

As an example of feature declarations, consider the following extract from Generalized Phrase Structure Grammar by Gerald Gazdar, Ewan Klein, Geoffrey Pullum, and Ivan Sag (Harvard University Press, 1985). In the appendix to their book (pages 245-247), they propose a feature system for English of which this is just a sampling: feature value range INV {+, -} CONJ {and, both, but, either, neither, nor, or, NIL} COMP {for, that, whether, if, NIL} AGR CAT PFORM {to, by, for, ...} Feature specification defaults FSD 1: [-INV] FSD 2: ~[CONJ] FSD 9: [INF, +SUBJ] --> [COMP for]

The INV feature, which encodes whether or not a sentence is inverted, allows only the values plus (+) and minus (-). If the feature is not specified, then the default rule (FSD 1 above) says that a value of minus is always assumed. The feature declaration for this feature would be encoded as follows: inverted sentence ]]>

The value range is specified as an alternation (more precisely, an exclusive disjunction) of plus and minus. That is, the value must be one or the other, but not both or neither.

The CONJ feature indicates the surface form of the conjunction used in a construction. The ~ in the default rule (see FSD 2 above) represents negation. This means that by default the feature is not applicable, in other words, no conjunction is taking place. This corresponds to the simple value none; see section . Note that this is distinct from the NIL value allowed in the value range. In their analysis, NIL means that the phenomenon of conjunction is taking place but there is no explicit conjunction in the surface form of the sentence. The feature declaration for this feature would be encoded as follows: surface form of the conjunction ]]> Note that the vDefault is not strictly necessary in this case, since none is the value assumed in the absence of a default specification.

The COMP feature indicates the surface form of the complementizer used in a construction. In value range, it is analogous to CONJ. However, its default rule (see FSD 9 above) is conditional. It says that if the verb form is infinitival (the VFORM feature is not mentioned in the rule since it is the only feature that can take INF as a value), and the construction has a subject, then a for complement must be used. For instance, to make John the subject of the infinitive in It is necessary to go, a for complement must be used; that is, It is necessary for John to go. The feature declaration for this feature would be encoded as follows: surface form of the complementizer ]]>

The AGR feature stores the features relevant to subject-verb agreement. Gazdar et al. specify the range of this feature as CAT. This means that the value is a category, which is their term for a feature structure. This is actually too weak a statement. Not just any feature structure is allowable here; it must be a feature structure for agreement (which is defined in the complete example at the end of the chapter to contain the features of person and number). The following feature declaration encodes this constraint on the value range: agreement for person and number ]]> That is, the value must be a feature structure of type Agreement. The complete example at the end of this chapter includes the fsDecl type=Agreement which includes fDecl name=PERS and fDecl name=NUM.

The PFORM feature indicates the surface form of the preposition used in a construction. Since PFORM is specified above as an open set, str is used in the range specification below rather than sym. word form of a preposition ]]> This example makes use of a negation. str rel=ne/str subsumes any string that is not the empty string.

The formal definition for feature declarations follows. Note that the class featureVal includes all possible single feature values, including a vAlt. ]]> Feature Structure Constraints

Ensuring the validity of feature structures may require much more than simply specifying the range of allowed values for each feature. There may be constraints on the co-occurrence of one feature value with the value of another feature in the same feature structure or in an embedded feature structure.

Such constraints on valid feature structures are expressed as a series of conditional and biconditional tests in the fsConstraints part of an fsDecl. A particular feature structure is valid only if it meets all the constraints. The cond element encodes the conventional if-then conditional of boolean logic which succeeds when both the antecedent and consequent are true, or whenever the antecedent is false. The bicond element encodes the biconditional (if and only if) operation of boolean logic. It succeeds only when both antecedent and consequent are true, or both are false. In feature structure constraints the antecedent and consequent are expressed as feature structures; they are considered true if they subsume (see section ) the target feature structure. The following elements make up the fsConstraints part of an FSD: specifies constraints on the content of well formed feature structures. defines a conditional feature-structure constraint; the consequent and the antecedent are specified as feature structures or feature-structure groups; the constraint is satisfied if both the antecedent and the consequent subsume a given feature structure, or if the antecedent does not. defines a biconditional feature-structure constraint; both consequent and antecedent are specified as feature structures or groups of feature structures; the constraint is satisfied if both subsume a given feature structure, or if both do not. separates the condition from the default in an if, or the antecedent and the consequent in a cond element. separates the condition from the consequence in a bicond element.

For an example of feature structure constraints, consider the following feature co-occurrence restrictions extracted from the feature system for English proposed by Gazdar, Klein, Pullum, and Sag (1985:246-247): FCR 1: [+INV] → [+AUX, FIN] FCR 7: [BAR 0] ≡ [N] & [V] & [SUBCAT] FCR 8: [BAR 1] → ~[SUBCAT]

The first constraint says that if a construction is inverted, it must also have an auxiliary and a finite verb form. That is, ]]>

The second constraint says that if a construction has a BAR value of zero (i.e., it is a sentence), then it must have a value for the features N, V, and SUBCAT. By the same token, because it is a biconditional, if it has values for N, V, and SUBCAT, it must have BAR=0. That is, ]]>

The final constraint says that if a construction has a BAR value of 1 (i.e., it is a phrase), then the SUBCAT feature is irrelevant (~). This is not biconditional, since there are other instances under which the SUBCAT feature is irrelevant. That is, ]]>

The DTD fragment for feature structure constraints is as follows. Note that cond and bicond use the empty tags then and iff, respectively, to separate the antecedent and consequent. These are primarily for the sake of enhancing human readability. ]]> A Complete Example

To summarize this chapter, the complete FSD for the example that has run through the chapter is reproduced below: A sample FSD based on an extract from Gazdar et al.'s GPSG feature system for English encoded by Gary F. Simons This sample was first encoded by Gary F. Simons (Summer Institute of Linguistics, Dallas, TX) on January 28, 1991. Revised April 8, 1993 to match the specification of FSDs in version P2 of the TEI Guidelines.

This sample FSD does not describe a complete feature system. It is based on extracts from the feature system for English presented in the appendix (pages 245-247) of Generalized Phrase Structure Grammar, by Gazdar, Klein, Pullum, and Sag (Harvard University Press, 1985). Encodes a feature structure for the GPSG analysis of English (after Gazdar, Klein, Pullum, and Sag) inverted sentence surface form of the conjunction surface form of the complementizer agreement for person and number word form of a preposition This type of feature structure encodes the features for subject-verb agreement in English person (first, second, or third) number (singular or plural) ]]>