The Feature System Declaration (FSD) is an auxiliary file used in
conjunction with a TEI-conforming text that makes use of fs
(that is, feature structure) elements.
The FSD serves three purposes:
It provides a mechanism by which the encoder can list all of the
feature names and feature values and give a prose description as to what
each represents.
It provides a mechanism by which the encoder can define
constraints on what it means to be a well-formed feature structure.
These constraints may involve constraints on the range of a feature
value, constraints on what features are valid within certain types of
feature structures, or constraints that prevent the co-occurrence of
certain feature-value pairs.
It provides a mechanism by which the encoder can define the
intended interpretation of underspecified feature structures. This
involves defining default values (whether literal or computed) for
missing features.
As a component of the interchange standard for encoded text, the FSD
serves an important function in documenting precisely what the encoder
intended by the system of feature structure markup used in the encoded
text. As application software is developed which makes use of encoded
texts, the FSD will also become an important resource that will allow
software to validate the feature structure markup in a text and to infer
the full interpretation of underspecified feature structures.
This chapter begins by describing how the encoded text uses header
information to make links to any associated FSDs. The second through
fourth sections describe the overall structure of an FSD and give
details of how to encode its parts. The final section offers a full
example.
For a fuller discussion of the reasoning behind FSDs
and for another complete example, see A rationale for the TEI
recommendations for feature-structure markup, by D. Terence
Langendoen and Gary F. Simons (to appear in the special TEI issue of
Computers and the Humanities).Linking a TEI Text to Feature System Declarations
In order for application software to use feature system declarations
to aid in the automatic interpretation of encoded texts, or even for
human readers to find the appropriate declarations which document the
feature system used in markup, there must be a formal link from the
encoded texts to the declarations. As it turns out, the mechanism for
linking texts to FSDs parallels the mechanism for linking texts to
writing system declarations (WSDs).
The linkage is made in two places. First, in the DTD subset of the
document type declaration at the beginning of the text file, an external
entity is defined for each FSD that is associated with the encoded text.
That entity declaration associates an entity name with the name of a
file on the host system. It appends the SUBDOC keyword to tell the
processor that the named file is a self-contained SGML document. See
the example below for details of syntax.
The second place in which the linkage from text to FSDs is made is in
the TEI header, as mentioned in section . Within the
encodingDesc element, a special fsdDecl element may be
used for each distinct feature structure type, as follows:
identifies the feature system declaration which contains
definitions for a particular type of feature structure.
Attributes include:
identifies the type of feature structure documented in the
FSD; this will be the value of the type
attribute on at least one feature structure.specifies the external entity containing the feature system
declaration; an entity declaration in the document's DTD
subset must associate the entity name with a file on the
system.
Note that one fsdDecl element must be specified for each
distinct type of feature structure used in the markup. The
fsd element supplies the name of the external entity
containing the actual declaration for that type of feature structure.
There may be multiple fsdDecl elements for a given FSD; one
for each type of feature structure it defines. For instance, in the
following example, the file lex.fsd contains an FSD that contains definitions
of feature structures for both lexical entries (fs
type=entry) and lexical subentries (fs type=subentry).
The following example shows the markup for linking a TEI document to
two WSDs and two FSDs. The linkage to both WSDs and FSDs is shown in
order to illustrate the parallel nature of the linking mechanisms for
both kinds of auxiliary files.
]>
... English>
French>
]]>
The auxiliary tag set for feature system declarations is
contained in the file teifsd2.dtd, which has
the overall structure shown below:
%TEI.elementNames;
%TEI.keywords.ent;
%TEI.elementClasses;
%TEI.fs.dtd;
%TEI.header.dtd;
%TEI.core.dtd;
]]>
The Overall Structure of a Feature System Declaration
A feature system declaration is encoded as a document of type
teiFsd2. It has two parts: an obligatory header (which
provides bibliographic information for the file) and a set of feature
structure declarations (each of which defines one type of feature
structure). Each feature structure declaration in turn has three parts:
an optional description (which gives a prose comment on what that type
of feature structure encodes), an obligatory set of feature declarations
(which specify range constraints and default values for the features in
that type of structure), and optional feature structure constraints
(which specify co-occurrence restrictions on feature values). The
header is encoded as a teiHeader, just as for any TEI.2
document; see chapter . The other components listed
above are unique to feature system declarations. Thus, the following
new elements are involved:
contains a feature system declaration.declares one type of feature structure.
Attributes include:
gives a name for the type of feature structure being
declared.gives the name of the feature structure type from which
this type inherits features and constraints; if this type
declares a feature with the same name as a feature of the
base type, the definition within this fsDecl
overrides the inherited definition. The
fsConstraints are inherited only if this
fsDecl does not specify any; otherwise the
constraints in this fsDecl override. When no
baseType is specified, no features or
constraints are inherited.describes in prose what is represented by the type of
feature
structure declared in the enclosing
fsDecl.declares a single feature, specifying its name,
organization,
range of allowed values, and optionally its
default value.specifies constraints on the content of well formed feature
structures.
Feature declarations and feature structure constraints are described
in the next two sections of this chapter. Note that the specification
of similar fsDecl elements can be simplified by devising an
inheritance hierarchy for the feature structure types. Each
fsDecl may name a baseType from which it inherits
feature declarations and constraints. For instance, suppose that
fsDecl type=Basic contains fDecl name=One and
fDecl name=Two, and that fsDecl type=Derived
baseType=Basic contains just fDecl name=Three. Then
any instance of fs type=Derived may include all three
features. This is because fsDecl type=Derived inherits the
two feature declarations from fsDecl type=Basic when it
specifies a baseType of Basic.
The following sample shows the overall structure of a complete FSD.
Note that as a stand-alone document it begins with a DOCTYPE declaration
which identifies the associated DTD.
Describes what this type of fs represents>
]]>
The formal definition of teiFsd2 and feature structure
declarations is as follows:
]]>
Feature Declarations
Each feature is declared in an fDecl element whose
name attribute identifies the feature being declared; this
matches the name attribute of the f elements it
declares. An fDecl also has an org attribute which
declares the organizing principle for the values of the f
elements it declares. That is, the value may be a unit (a single value), a set
(in which the order is not significant and there are no duplicates), a
bag (in which the order is not significant but
duplicates are allowed), or a list (in which the
order is significant). (See definition of org attribute of
f in section .) An fDecl has three
parts: an optional prose description (which should explain what the
feature and its values represent), an obligatory range specification
(which declares what values the feature is allowed to have), and an
optional default specification (which declares what default value should
be supplied when the named feature does not appear in an fs).
A single unconditional default value may be specified, or multiple
conditional values. If no default is specified, or if none of the
conditions is met, then the default value is none; in other
words, the feature is not applicable (see section for
a discussion of the none element).
The tags used in feature declarations are the following:
declares a single feature, specifying its name,
organization,
range of allowed values, and optionally its
default value.
Attributes include:
indicates the name of the feature being declared; matches
the name attribute of f elements in the
text.specifies the organizing discipline of the feature value.
Legal values are:
unitary atomic valueset value (unordered, no duplicates)bag value (unordered, may have duplicates)list value (ordered, may have duplicates)describes in prose what is represented by the feature being
declared and its values.defines the range of allowed values for a feature, in the
form of
an fs, vAlt, or primitive value;
for the value of an f to be valid, it must be
subsumed by the specified range; if the
f
contains multiple values (as sanctioned by the
org attribute),
then each value must be
subsumed by the vRange.declares the default value to be supplied when a feature
structure
does not contain an instance of f for
this name; if
unconditional, it is specified as one (or,
depending on the value of
the org attribute of
the enclosing fDecl) more
fs elements or
primitive values;
if conditional, it is specified as
one
or more if elements; if no default is specified,
or no
condition matches, the value none is
assumed.defines a conditional default value for a feature; the
condition
is specified as a feature structure, and is met
if it
subsumes the feature structure in the
text for which a
default value is sought.separates the condition from the default in an if,
or
the antecedent and the consequent in a cond
element.
The logic for validating feature values and for matching the
conditions for supplying default values is based on the operation of
subsumption. Subsumption is a standard operation in
feature-structure-based formalisms. Informally, a feature structure
fs subsumes all feature structures that are at least as
informative as itself; that is, all feature structures that specify at
least as many features as fs with values at least as
informative as those given in fs (Pereira 1987:6; see also
Shieber 1986:14-16).
Fernando C. N. Pereira, Grammars and logics of partial
information, SRI International Technical Note 420 (Menlo
Park, CA: SRI International, 1987), and
Stuart Shieber, An Introduction to Unification-based
Approaches to Grammar, CSLI Lecture Notes 4 (Palo Alto,
CA: Center for the Study of Language and Information,
1986).
A more formal definition requires that we first define the notion of
domain of a feature structure. A feature structure can be viewed
as a partial function that maps features onto values; when viewed in
this way, the domain of a feature structure is the set of top-level
features it contains (that is, excluding features in embedded feature
structures). We can now offer a more precise definition:
fs subsumes fs′ if both are
identical primitive values, or if the domain of fs
is a subset of the domain of fs′, and for every
feature f in the domain of fs, the
value of f in fs subsumes the value
of f in fs'.
Following the spirit of the informal definition above, we can extend
subsumption in a straightforward way to cover alternation, negation,
special primitive values, and the use of attributes in the SGML markup.
For instance, a vAlt containing the value v subsumes v. The negation
REL=ne of value v
subsumes any value that is not v. The value
unknown subsumes any value. The value any subsumes
any value that is in the range of a feature. fs
type=X/fs subsumes any feature structure with TYPE=X.
nbr rel=ge value=0 subsumes any nbr with value
greater than or equal to zero.
As an example of feature declarations, consider the following extract
from
Generalized Phrase Structure Grammar by Gerald
Gazdar, Ewan Klein, Geoffrey Pullum, and Ivan Sag (Harvard University
Press, 1985). In the appendix to their book (pages 245-247), they
propose a feature system for English of which this is just a sampling:
feature value range
INV {+, -}
CONJ {and, both, but, either, neither, nor, or, NIL}
COMP {for, that, whether, if, NIL}
AGR CAT
PFORM {to, by, for, ...}
Feature specification defaults
FSD 1: [-INV]
FSD 2: ~[CONJ]
FSD 9: [INF, +SUBJ] --> [COMP for]
The INV feature, which encodes whether or not a sentence is inverted,
allows only the values plus (+) and minus (-). If the feature is not
specified, then the default rule (FSD 1 above) says that a value of
minus is always assumed. The feature declaration for this feature would
be encoded as follows:
inverted sentence
]]>
The value range is specified as an alternation (more precisely, an
exclusive disjunction) of plus and minus. That is,
the value must be one or the other, but not both or neither.
The CONJ feature indicates the surface form of the conjunction used
in a construction. The ~ in the default rule (see FSD 2 above)
represents negation. This means that by default the feature is not
applicable, in other words, no conjunction is taking place. This
corresponds to the simple value none; see section . Note that this is distinct from the NIL value allowed in
the value range. In their analysis, NIL means that the phenomenon of
conjunction is taking place but there is no explicit conjunction in the
surface form of the sentence. The feature declaration for this feature
would be encoded as follows:
surface form of the conjunction
]]>
Note that the vDefault is not strictly necessary in this case,
since none is the value assumed in the absence of a default
specification.
The COMP feature indicates the surface form of the complementizer
used in a construction. In value range, it is analogous to CONJ.
However, its default rule (see FSD 9 above) is conditional. It says
that if the verb form is infinitival (the VFORM feature is not mentioned
in the rule since it is the only feature that can take INF as a value),
and the construction has a subject, then a for
complement must be used. For instance, to make John the subject of the
infinitive in It is necessary to go, a
for complement must be used; that is,
It is necessary for John to go. The feature
declaration for this feature would be encoded as follows:
surface form of the complementizer
]]>
The AGR feature stores the features relevant to subject-verb
agreement. Gazdar et al. specify the range of this feature as CAT.
This means that the value is a category, which
is their term for a feature structure. This is actually too weak a
statement. Not just any feature structure is allowable here; it must be
a feature structure for agreement (which is defined in the complete
example at the end of the chapter to contain the features of person and
number). The following feature declaration encodes this constraint on
the value range:
agreement for person and number
]]>
That is, the value must be a feature structure of type Agreement. The complete example at the end of this
chapter includes the fsDecl type=Agreement which includes
fDecl name=PERS and fDecl name=NUM.
The PFORM feature indicates the surface form of the preposition used
in a construction. Since PFORM is specified above as an open set,
str is used in the range specification below rather than
sym.
word form of a preposition
]]>
This example makes use of a negation. str
rel=ne/str subsumes any string that is not the empty
string.
The formal definition for feature declarations follows. Note that
the class featureVal includes all possible
single feature values, including a vAlt.
]]>
Feature Structure Constraints
Ensuring the validity of feature structures may require much more
than simply specifying the range of allowed values for each feature.
There may be constraints on the co-occurrence of one feature value with
the value of another feature in the same feature structure or in an
embedded feature structure.
Such constraints on valid feature structures are expressed as a
series of conditional and biconditional tests in the
fsConstraints part of an fsDecl. A particular feature
structure is valid only if it meets all the constraints. The
cond element encodes the conventional if-then conditional of
boolean logic which succeeds when both the antecedent and consequent are
true, or whenever the antecedent is false. The bicond element
encodes the biconditional (if and only if) operation of boolean logic.
It succeeds only when both antecedent and consequent are true, or both
are false. In feature structure constraints the antecedent and
consequent are expressed as feature structures; they are considered true
if they subsume
(see section ) the target feature structure. The
following elements make up the fsConstraints part of an FSD:
specifies constraints on the content of well formed feature
structures.defines a conditional feature-structure constraint; the
consequent
and the antecedent are specified as feature
structures or
feature-structure groups; the constraint is
satisfied if both the
antecedent and the consequent
subsume a given feature
structure, or if the
antecedent does not.defines a biconditional feature-structure constraint; both
consequent and antecedent are specified as feature
structures or groups
of feature structures; the constraint
is satisfied if both
subsume a given feature
structure, or if both do not.separates the condition from the default in an if,
or
the antecedent and the consequent in a cond
element.separates the condition from the consequence in a
bicond
element.
For an example of feature structure constraints, consider the
following feature co-occurrence restrictions
extracted from the feature system for English proposed by Gazdar, Klein,
Pullum, and Sag (1985:246-247):
FCR 1: [+INV] → [+AUX, FIN]
FCR 7: [BAR 0] ≡ [N] & [V] & [SUBCAT]
FCR 8: [BAR 1] → ~[SUBCAT]
The first constraint says that if a construction is inverted, it must
also have an auxiliary and a finite verb form. That is,
]]>
The second constraint says that if a construction has a BAR value of
zero (i.e., it is a sentence), then it must have a value for the
features N, V, and SUBCAT. By the same token, because it is a
biconditional, if it has values for N, V, and SUBCAT, it must have
BAR=0. That is,
]]>
The final constraint says that if a construction has a BAR value of 1
(i.e., it is a phrase), then the SUBCAT feature is irrelevant (~).
This is not biconditional, since there are other instances under which
the SUBCAT feature is irrelevant. That is,
]]>
The DTD fragment for feature structure constraints is as follows.
Note that cond and bicond use the empty tags
then and iff, respectively, to separate the antecedent
and consequent. These are primarily for the sake of enhancing human
readability.
]]>
A Complete Example
To summarize this chapter, the complete FSD for the example that has
run through the chapter is reproduced below:
A sample FSD based on an extract from Gazdar
et al.'s GPSG feature system for Englishencoded byGary F. Simons
This sample was first encoded by Gary F. Simons (Summer
Institute of Linguistics, Dallas, TX) on January 28, 1991.
Revised April 8, 1993 to match the specification of FSDs
in version P2 of the TEI Guidelines.
This sample FSD does not describe a complete feature
system. It is based on extracts from the feature system
for English presented in the appendix (pages 245-247) of
Generalized Phrase Structure Grammar, by Gazdar, Klein,
Pullum, and Sag (Harvard University Press, 1985).
Encodes a feature structure for the GPSG analysis
of English (after Gazdar, Klein, Pullum, and Sag)inverted sentencesurface form of the conjunctionsurface form of the complementizeragreement for person and numberword form of a prepositionThis type of feature structure encodes the features
for subject-verb agreement in Englishperson (first, second, or third)number (singular or plural)
]]>