========================================================================= Date: Wed, 8 Sep 1993 07:42:51 CDT Reply-To: "C. M. Sperberg-McQueen" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "C. M. Sperberg-McQueen" Subject: Re: Query on registering new character entities for ISO 8879 In-Reply-To: Message of Mon, 9 Aug 1993 12:23:18 CDT from On Mon, 9 Aug 1993 12:23:18 CDT James Do said: >This is a query in regards to registering character entities in the >Latin-based Vietnamese writing system, for use with SGML and TEI. > >Recently, the document TCVN 5712:1993 (VSCII) was approved ... >Please let me know the naming conventions and procedure for registering >these additional character entities for use with ISO 8879 (SGML). >Thanks in advance for your help. The methods for creating formal declarations of character sets and entity sets for use with TEI documents are described in chapter WD of document TEI P2, on the TEI Writing System Declaration. This chapter has, unfortunately, not yet been published, but should be out within the next few weeks. The TEI places no restrictions on the naming conventions for entities beyond those placed by SGML itself; in general, it is probably good practice to use only the letters A-Z and a-z, the ten Arabic digits, hyphen, and full stop (dot) in names. The reference concrete syntax also restricts name length to eight characters; the TEI does not make such a restriction. If you do create a writing system declaration based on TCVN 5712, you would do us a great favor by sending it to the TEI (use my address); we will make it available on our file server, along with the writing system declarations defined by the TEI itself. -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago ========================================================================= Date: Tue, 14 Sep 1993 18:57:05 CDT Reply-To: "C. M. Sperberg-McQueen" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "C. M. Sperberg-McQueen" Subject: new fascicle: TD (tag set documentation) * * * * * * * * * * * * * * * * * * * TEI P2 * * New fascicle now available * * Chapter TD * * Auxiliary Tag Set for * * Tag Set Documentation * * * * * * * * * * * * * * * * * * * A new chapter of TEI P2, the second draft of the TEI Guidelines for Electronic Text Encoding and Interchange, is now available for public comment. As readers of this list will recall, TEI P2 is being distributed for comment as a series of fascicles or part-issues, each containing a complete chapter of P2, as and when the texts were available. (File TEI ED J8, "Obtaining the Second Version of the TEI Guidelines," has the details, if you have forgotten). Chapter 29 (known internally as 'TD'), defines an auxiliary tag set for the documentation of extensions to the TEI encoding scheme. (Actually, it can be used for the documentation of any SGML -- or even non-SGML -- encoding scheme, and the tag set documented here is the same as that in which the reference material to TEI P2 itself is prepared. (Aside, that is, from some recent revisions, which affect the tag set defined here but not -- yet -- that used in the publication of TEI P2. That is, we are practicing what we preach.) Three major documentation units are defined by the chapter, for documenting SGML tags, SGML entities, and 'element classes' of the sort used by the TEI (for fuller discussion, see chapter ST, Structure of the Guidelines). A related chapter (XT, Sample Tag Set Documentation) is being revised and should be published in a few days. Of course, examples of this tag set can be abundantly found in the 'REF' files of published chapters, but they don't have any commentary. Together with this chapter, two DTD files are being released: teitsd2.dtd defines the auxiliary tag set itself, and teitdgis.dtd defines the names of elements in the tag set. We append the usual information on how to retrieve this chapter, for the convenience of subscribers. -C. M. Sperberg-McQueen Lou Burnard 14 September 1993 ----- Texts of P2 are being made available in a number of different electronic formats. These include plain screen-readable text (filetype DOC), LaTeX (filetype TEX), PostScript (filetype PS) and of course SGML (filetypes P2X and REF). The file P2TDDRIV P2X is a driver for the P2X and REF files. In addition, together with this chapter two DTD files have been released. To get electronic copies of this fascicle from the TEI-L fileserver, all you need do is send an ordinary email note to the address LISTSERV@UICVM (or listserv@uicvm.uic.edu) containing whichever of the following lines describes the files you want: GET P2TD DOC GET P2TD PS GET P2TDDRIV P2X GET P2TD P2X GET P2TD REF GET TEITSD2 DTD GET TEITDGIS DTD The DOC and PS files include the complete fascicles; the P2X file contains only this chapter. The documents you request will be returned to you automatically as e-mail messages. Beware! some of the files are quite large, and so may be delayed. You will also receive an automatic notification that the file is on its way to you. (If you receive something illegible in a 'Listserv packed format', please contact one of the editors directly to see about getting you the file in a more useful form.) The same files are available via anonymous FTP from the SGML Project at the University of Exeter. To access these files, your computer system must be on the InterNet. If it is, you should be able to give the command ftp sgml1.ex.ac.uk [ or FTP 144.173.6.61] When you are connected to the Exeter SGMLbox, type the following commands (or at least, the ones which apply to the files you want): cd tei/p2/drafts get p2td.doc get p2td.ps get p2tddriv.p2x get p2td.p2x get p2td.ref cd ../dtds get teitsd2.dtd get teitdgis.dtd NB: file names *must* be given in lower case letters. The files may also be obtained from the Markup-L Listserv fileserver in Germany, and from Professor Syun Tutiya in Japan. For more details on these and other sources of TEI information, please order copies of files EDJ8 MEMO (describes how to retrieve electronic copies of TEI P2 and the various formats in which they are available) EDJ9 MEMO (describes how to request paper copies of TEI P2, for those without electronic mail access) (on the Exeter file server, get file tei/intro/edj8.doc) ========================================================================= Date: Thu, 16 Sep 1993 18:35:15 CDT Reply-To: Sean MARRETT Sender: "TEI-L: Text Encoding Initiative public discussion list" From: Sean MARRETT Subject: Practical products for transcript analysis In an effort to help a research project along, I am trying to determine the most effective way for a research project to proceed. Briefly, the project involves the analysis of classroom transcriptions with the goal of characterising different teachers according to experience. I am aware of such programs as TACT and IT, but have also spent some time perusing the TEI archives. What is not clear to me is what TEI-based products/software/tools are *available* for (a) Marking up text (transcriptions) using TEI (and presumably the DTD for transcriptions). (b) Analysis tools for exploring and quantifying patterns within the TEI annotated text. Is it premature to base a research project on TEI tools ?. I am interested in software for (in order of priority) PC's, MAC's and Unix workstations. Any advice would be greatly appreciated. Simplicity is important, since the researchers involved are relatively unsophisticated (at least as far as their access/knowledge of computer-based tools is concerned). Thanks. I would be happy to summarize to the list, if requested. Sean -- -------------------------------------------------------- Sean Marrett email: sean@pet.mni.mcgill.ca wb213, PET Unit, Montreal Neurological Institute. 3801 University St., Montreal, Quebec. H3A 2B4 tel:(514)-398-1537,1996 Fax: 8948 ========================================================================= Date: Thu, 16 Sep 1993 18:36:26 CDT Reply-To: "C. M. Sperberg-McQueen" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "C. M. Sperberg-McQueen" Subject: another chapter! SH, independent header * * * * * * * * * * * * * * * * * * TEI P2 * * new fascicle now available * * Chapter SH * * Auxiliary tag set for * * Independent Headers * * * * * * * * * * * * * * * * * * A new chapter of TEI P2 (chapter SH, Independent Header) is now available for public comment. As readers of this list will recall, TEI P2 is the second draft of the TEI Guidelines for Electronic Text Encoding and Interchange, and is being distributed for comment chapter by chapter, as and when the chapters are ready for comment. (File TEI ED J8, "Obtaining the Second Version of the TEI Guidelines," has the details, if you have forgotten). Chapter 26 (known internally as 'SH'), defines an 'auxiliary' tag set for the interchange of bibliographic and other information relating to electronic texts. The tag set, unsurprisingly, bears a strong resemblance to that defined for the TEI header (in so far as anything can be said to 'bear a resemblance' to itself): in effect, chapter SH describes the interchange of TEI headers as independent documents (hence its title). Because data archives have for years collected such information about social-science data sets, and libraries have for years exchanged bibliographic data in electronic form, there is a reasonably well understood need for the ability to handle this sort of information for electronic texts, without, however, any serious candidates for a mechanism for exchanging the information. This chapter provides the necessary mechanisms. Since the requirements of information interchange for bibliographic control and other archival activities are fairly high, chapter SH sets a higher standard for the creation of TEI headers than is set in general. (That is, if you are an individual creating a TEI document for your own use, you may find it convenient to use the loosest form of the TEI header, and omit most of the elements which are not actually required. If, however, you are running a data archive and wish to create TEI headers for your collection, you should use the tighter forms of the header, and you should provide fuller information.) The chapter contains an explicit list of recommended and required elements, with examples, and also discusses briefly the issues raised when the TEI header is used as the principal source of information for library cataloguing. Together with this chapter, the following DTD files are being released: teishd2.dtd, which provides element and attribute list declarations for all elements in the tag set, and teishgis.dtd, which defines the parameter entities used for the generic identifiers in the tag set. We append the usual information on how to retrieve this chapter, for the convenience of subscribers. -C. M. Sperberg-McQueen Lou Burnard 16 September 1993 Viva Hidalgo! ----- Texts of P2 are being made available in a number of different electronic formats. These include plain screen-readable text (filetype DOC), LaTeX (filetype TEX), PostScript (filetype PS) and of course SGML (filetypes P2X and REF). In addition, many chapters define specific DTD files which are released together with the chapter. All files can be retrieved from any of several servers, as described below: 1 Getting files from the server in Chicago To get electronic copies of this fascicle from the TEI-L fileserver at the University of Illinois at Chicago, all you need to do is send an ordinary email note to the address LISTSERV@UICVM (or, from the Internet: listserv@uicvm.uic.edu) containing whichever of the following lines describe the file(s) you want: get p2sh doc get p2sh ps get p2sh p2x get p2sh ref get p2shdriv p2x get teishd2 dtd get teishgis dtd The DOC and PS files include the complete fascicles; the P2X file contains only this chapter. The file P2SHDRIV P2X is a 'driver file' which embeds the chapter file P2SH P2X and the accompanying reference material, P2SH REF. Further details on the DTD used may be had by contacting the editors. The documents you request will be returned to you automatically as e-mail messages. Beware! some of the files are quite large, and so may be delayed. You will also receive an automatic notification that the file is on its way to you. (If you receive something illegible in a 'Listserv packed format', please contact one of the editors directly to see about getting you the file in a more useful form.) 2 Getting files from the server in England The same files are available via anonymous FTP from the SGML Project at the University of Exeter. To access these files, your computer system must be on the InterNet. If it is, you should be able to give the command FTP sgml1.ex.ac.uk [ or FTP 144.173.6.61] When you are connected to the Exeter SGMLbox, type the following commands (or, whichever actually describe the files you want): cd tei/p2/drafts get p2sh.p2x get p2sh.doc get p2sh.ps get p2sh.ref cd ../drafts get teishd2.dtd get teishgis.dtd (note that the filename *must* be given in all lower-case letters) 3 Getting the files from other servers The files may also be obtained from the Markup-L Listserv fileserver in Germany, and from Professor Syun Tutiya in Japan. For more details on these and other sources of TEI information, please order copies of files EDJ8 MEMO (describes how to retrieve electronic copies of TEI P2 and the various formats in which they are available) EDJ9 MEMO (describes how to request paper copies of TEI P2, for those without electronic mail access) (on the Exeter file server, get file tei/intro/edj8.doc) ========================================================================= Date: Fri, 17 Sep 1993 22:21:37 CDT Reply-To: Elaine M Brennan Sender: "TEI-L: Text Encoding Initiative public discussion list" From: Elaine M Brennan Subject: Second Call for Papers: ALLCACH '94 Association for Literary and Linguistic Computing Association for Computers and the Humanities "CONSENSUS EX MACHINA" Joint International Conference ALLC-ACH94 April 19-23, 1994 Paris Second notice The ALLC-ACH conferences are the major forum for literary, linguistic and humanities computing. A particular focus of the conference "Consensus ex Machina" will be the methodological impact of computer science and mathematics on the humanities. Resorting to computer science and to mathematics is now often the most dramatic attempt to impart more objectivity (and consequently more consensus) to the humanities. What obstacles does such an undertaking meet? What successes can it claim? What failures must it admit to? Is there a way forward which will increase our knowledge and understanding of the humanities? LOCATION The conference will be held at La Sorbonne which stems from a college founded in 1253 by Robert de Sorbon and presently hosts the Universities of Paris IV (Arts and Humanities) as well as the famous /Ecole des Chartes (History). Accommodation for participants will be available in the lively Latin Quarter through the conference travel agency. The Latin Quarter and la Sorbonne can be very easily reached from Paris airports and stations thanks to the metro and the RER (regional express network). PROGRAMME The Paris conference will be held in April 1994. Its programme will be as follows: Tuesday 19th morning: welcome Tuesday 19th afternoon: opening and sessions Wednesday 20th: sessions Thursday 21th morning: sessions Thursday 21th afternoon: excursion (Versailles) Friday 22th morning and afternoon: sessions Friday 22th evening: banquet Saturday 23th morning: sessions TOPICS The Association for Literary and Linguistic Computing and the Association for Computers and the Humanities invite submissions on computer-aided topics in literature, linguistics and the language- oriented aspects of the humanities disciplines such as history, archaeology and music: statistical methods for text analysis, text encoding, text corpora, computational lexicography, machine translation, etc. LANGUAGES The official languages of the conference will be English and French. However papers can also be presented in another EEC language provided that they bear on the corresponding linguistic or literary themes. The coding scheme used in this announcement for French words is : /e = e + acute accent, /E = E + acute accent, \e = e + grave accent and \a = a + grave accent. REQUIREMENTS Proposals should describe substantial and original work. Those that concentrate on the development of new computing methodologies should make clear how the methodologies are applied to research and/or teaching in the humanities and should include some critical assessment of the application of those methodologies in the humanities. Those that concentrate on a particular application in the humanities (e.g., a study of the style of an author) should cite traditional as well as computer-based approaches to the problem and should include some critical assessment of the computing methodologies used. All proposals should include conclusions and references to important sources. ABSTRACT LENGTH Abstracts of 1500 words should be submitted for presentations of 25 minutes. Abstracts of 2500 words should be submitted for lectures of 45 minutes (state of the art themes only). FORMAT FOR SUBMISSIONS Electronic submissions are strongly encouraged. Please pay particular attention to the format given below. Submissions which do not conform to this format will be returned to the authors for reformatting, or may not be considered if they arrive very close to the deadline. All submissions should begin with the following information: Title: title of paper Author(s): names of author(s) Affiliation: of author(s) Contact address: full postal address E-mail: electronic mail address of main author (for contact), followed by other authors (if any) Fax number: of main author Phone number: of main author ELECTRONIC SUBMISSIONS These should be plain ASCII text files, not files formatted by a word processor, and should not contain tab character or soft hyphens. Paragraphs should be separated by blank lines. Headings and subheadings should be on separate lines and be numbered. Notes, if needed at all, should take the form of endnotes rather than footnotes. References, up to six, should be given at the end. Choose a simple markup scheme for accents and other characters that cannot be transmitted by electronic mail, and include an explanation ot the markup scheme after the title information. Electronic submissions shoud be sent to: ALLCACH@BLIULG11 with the subject line " Submission for ALLC-ACH94." PAPER SUBMISSIONS Submissions should be typed or printed on one side of the paper only, with ample margins. Six copies should be sent to the ALLC-ACH94 Programme Chair: Christian Delcourt, BELTEXT-Li\ege, Universit/e de Li\ege, place Cockerill, 3, B-4000 Li\ege, Belgium. DEADLINES: October 15th, 1993 (proposals of papers). December 15th, 1993 (notification of acceptance) February 15th, 1994 (advance registration) PUBLICATION OF PAPERS A selection of papers presented at the conference will be published in the series "Research in Humanities Computing" edited by Susan Hockey and Nancy Ide and published by Oxford University Press. Another one will be published as a special issue of T.A. Information. PROGRAM COMMITTEE Proposals will be evaluated by a panel of reviewers who will make recommendations to the Program Committee comprised of: Christian Delcourt, Chair Universit/e de Li\ege (ALLC) Elaine Brennan Brown University (ACH) Gordon Dixon Manchester Metropolitan University (ALLC) Paul A. Fortier University of Manitoba (ACH) Joel D. Goldfield Plymouth State College (ACH) Susan Hockey Rutgers and Princeton Universities (ALLC) Antonio Zampolli Universit\a degli Studi di Pisa (ALLC) Michael Neuman Georgetown University (ACH) Andr/e Salem, Local Organizer /Ecole normale sup/erieur de Saint-Cloud (ALLC) INQUIRIES Please address your inquiries to the ALLC-ACH94 Local Organizers: Andr/e Salem and Maurice Tournier, CNRS-INaLF, Lexicom/etrie et textes politiques, /Ecole Normale Sup/erieure, avenue de la Grille d'Honneur, F-92211 Saint-Cloud, France. Phone: 00+33+1+47.71.91.11 Fax: 00+33+1+46.02.39.11 ========================================================================= Date: Fri, 17 Sep 1993 22:22:52 CDT Reply-To: Peter Flynn Sender: "TEI-L: Text Encoding Initiative public discussion list" From: Peter Flynn Subject: Re: Practical products for transcript analysis > TEI-based products/software/tools are *available* for > > (a) Marking up text (transcriptions) using TEI (and presumably the > DTD for transcriptions). None. TEI proposals are in SGML, so any SGML-compliant tool will work with the TEI DTDs. > (b) Analysis tools for exploring and quantifying patterns within the > TEI annotated text. Ditto. Any SGML-based analytical tool will do this. > Is it premature to base a research project on TEI tools ?. > I am interested in software for (in order of priority) PC's, MAC's and > Unix workstations. Nope, we are using the TEI DTD right now, on Suns and PCs running commercial and PD software. There is a list of SGML-compliant software somewhere (in the FAQ? Mike?) ///Peter ========================================================================= Date: Fri, 17 Sep 1993 22:23:26 CDT Reply-To: "C. M. Sperberg-McQueen" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "C. M. Sperberg-McQueen" Organization: ACH/ACL/ALLC Text Encoding Initiative Subject: double play! two chapters * * * * * * * * * * * * * * * * * * TEI P2 * * new fascicles now available * * * * Chapter CE * * Additional tag set for * * Certainty and Responsibility * * * * Chapter GD * * Additional tag set for * * Graphs, Networks, and Trees * * * * * * * * * * * * * * * * * * Two new chapters of TEI P2, chapters 19 and 23 (CE and GD), defining additional tag sets for annotations on certain and responsibility and for graphs, digraphs, networks, and trees, are now available for public comment. As readers of this list will recall, TEI P2 is the second draft of the TEI Guidelines for Electronic Text Encoding and Interchange, and is being distributed for comment chapter by chapter, as and when the chapters are ready for comment. (File TEI ED J8, "Obtaining the Second Version of the TEI Guidelines," has the details, if you have forgotten). Chapter 19 (known internally as 'CE'), defines methods of indicating that one is uncertain about some aspect of the markup or transcription of the text, and for indicating precisely who is responsible for various aspects of the markup or transcription. Using the element defined by CE, encoders can record various possibilities for the interpretation of a text -- the running example used in the chapter is how to record one's uncertainty over whether to mark a noun phrase as a personal name or as a place name. The chapter presents a preliminary analysis of the ways in which one can be uncertain regarding markup and provides methods of recording them all. (One discovery made by the way is that some elements cannot be the subject of uncertainty, most importantly the applicability of the element itself). We are particularly interested in the reactions of individuals and projects who already have in place systems for recording certainty and estimates of probability regarding given analyses of a text; if you have examples of uncertainty which existing systems of markup can handle which cannot be expressed in the scheme presented by this chapter, the work group responsible for this chapter will be particularly keen to hear from you. Since when points of contention arise it is always handy to know exactly who has done what, chapter CE also defines a element, for allocating responsibility for The major concern of the editors is that the name 'respons' seems unduly awkward for the element allowing specific allocation of responsibility. (We thought about 'whodunnit' but decided regretfully against it. We solicit suggestions from readers of this list for a new and less funny-looking name for this element; those which are printable may be sent direct to the list.) Both the and the element are, of course, optional. Chapter 23 (Graphs, Networks, and Trees) defines a tag set for the representation of graphs (in their mathematical sense!), with special handling of digraphs (directed graphs) and trees. The markup described here can be used to encode the structural properties of many diagrams whose basic point would otherwise elude transcription. Examples given in the chapter include networks of airline connections, recognizers or accepters for regular languages (i.e. finite-state automata or transition networks), transducers for the same class of languages, including a transducer for translating between the English sentences The man comes The men come The old man comes The old men come The old old man comes The old old men come (etc.) and their French equivalents L'homme vient Les hommes vont Le vieil homme vient Les vieux hommes vont (etc.) Also, a family tree (Bertrand Russell's), a graph of the geographic relations in a Scottish historical document (just where is the Lordship of Knapdale? is it in Argyll?), syntax trees for various parses of "see the vessel with the periscope", and trees for representing the transformation derivation of surface phrase structures. Together with these chapters, the following DTD files are being released teicert2.ent, which defines special modifications to the TEI class system; teicert2.dtd, which provides element and attribute list declarations for all elements in the certainty tag set, and teicegis.dtd, which defines the parameter entities used for the generic identifiers in the certainty tag set. teinets2.dtd, which provides element and attribute list declarations for all elements in the graphs/networks tag set, and teigdgis.dtd, which defines the parameter entities used for the generic identifiers in the graphs/networks tag set. We append the usual information on how to retrieve this chapter, for the convenience of subscribers. -C. M. Sperberg-McQueen Lou Burnard 17 September 1993 ----- Texts of P2 are being made available in a number of different electronic formats. These include plain screen-readable text (filetype DOC), LaTeX (filetype TEX), PostScript (filetype PS) and of course SGML (filetypes P2X and REF). In addition, many chapters define specific DTD files which are released together with the chapter. All files can be retrieved from any of several servers, as described below: 1 Getting files from the server in Chicago To get electronic copies of this fascicle from the TEI-L fileserver at the University of Illinois at Chicago, all you need to do is send an ordinary email note to the address LISTSERV@UICVM (or, from the Internet: listserv@uicvm.uic.edu) containing whichever of the following lines describe the file(s) you want: get p2ce doc get p2ce ps get p2ce p2x get p2ce ref get p2cedriv p2x get p2gd doc get p2gd ps get p2gd p2x get p2gd ref get p2gddriv p2x get teicert2 ent get teicert2 dtd get teicegis dtd get teinets2 dtd get teigdgis dtd The DOC and PS files include the complete fascicles; the P2X file contains only this chapter. The files P2CEDRIV P2X and P2GDDRIV P2X are 'driver files' which embed the P2xx P2X and P2xx REF files. material, P2CE REF. Further details on the DTD used may be had by contacting the editors. The documents you request will be returned to you automatically as e-mail messages. Beware! some of the files are quite large, and so may be delayed. You will also receive an automatic notification that the file is on its way to you. (If you receive something illegible in a 'Listserv packed format', please contact one of the editors directly to see about getting you the file in a more useful form.) 2 Getting files from the server in the UK [n.b. as this note was being posted, Friday evening, network problems were preventing us from moving files to the server at Exeter. We expect to be able to update the Exeter server over the weekend, but if you find these files not available from Exeter, be patient. Or try the UIC server.] The same files are available via anonymous FTP from the SGML Project at the University of Exeter. To access these files, your computer system must be on the InterNet. If it is, you should be able to give the command FTP sgml1.ex.ac.uk [ or FTP 144.173.6.61] When you are connected to the Exeter SGMLbox, type the following commands (or, whichever actually describe the files you want): cd tei/p2/drafts get p2ce.p2x get p2ce.doc get p2ce.ps get p2ce.ref get p2gd.doc get p2gd.ps get p2gd.p2x get p2gd.ref get p2gddriv.p2x cd ../dtds get teicert2.ent get teicert2.dtd get teicegis.dtd get teinets2.dtd get teigdgis.dtd (note that the filename *must* be given in all lower-case letters) 3 Getting the files from other servers The files may also be obtained from the Markup-L Listserv fileserver in Germany, and from Professor Syun Tutiya in Japan. For more details on these and other sources of TEI information, please order copies of files EDJ8 MEMO (describes how to retrieve electronic copies of TEI P2 and the various formats in which they are available) EDJ9 MEMO (describes how to request paper copies of TEI P2, for those without electronic mail access) (on the Exeter file server, get file tei/intro/edj8.doc) ========================================================================= Date: Mon, 20 Sep 1993 19:22:13 CDT Reply-To: "C. M. Sperberg-McQueen" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "C. M. Sperberg-McQueen" Organization: ACH/ACL/ALLC Text Encoding Initiative Subject: new chapter: writing system declaration * * * * * * * * * * * * * * * * * * TEI P2 * * new fascicle now available * * Chapter WD * * Auxiliary tag set for * * Writing System Declarations * * * * * * * * * * * * * * * * * * A new chapter of TEI P2, chapter 27 (WD), Writing System Declarations, is now available for public comment. As readers of this list will recall, TEI P2 is the second draft of the TEI Guidelines for Electronic Text Encoding and Interchange, and is being distributed for comment chapter by chapter, as and when the chapters are ready for comment. (File TEI ED J8, "Obtaining the Second Version of the TEI Guidelines," has the details, if you have forgotten). Chapter 27 (known internally as 'WD'), defines an auxiliary tag set for 'writing system declarations' or WSDs. A WSD documents a given method of writing a given language: it identifies what language is being written, what script or 'writing system' is being used, and what coded character set, SGML entity set, transliteration scheme, or other method is being used to represent that script in machine-readable form. Readers who have not yet detected in themselves an enthusiasm for the minutiae of character-set design and implementation issues may -- or may not -- find, in this chapter, the spark that sets that enthusiasm ablaze. Those who do work with character set problems (enthusiastically or not) will, we hope, find the TEI writing system declaration an indispensible tool for the documentation of character sets, entity sets, and transliteration schemes. In the current imperfect state of character transmission over the global network, the WSD will provide the critical information required to enable automatic packing of TEI documents in network-safe forms, and to unpack them again at the other end. Unlike other techniques often used for this purpose (by uuencode, binhex, etc.), the WSD makes it possible to transmit a file safely through dangerous network gateways (the kind which eat square brackets, circumflexes, accented characters, and even sometimes exclamation points) without requiring the character set at the receiving end to be identical to the character set at the sending end. Together with this chapter, the following DTD files are being released: teiwsd2.dtd, which provides element and attribute list declarations for all elements in the tag set, and wdgis2.ent, which defines the parameter entities used for the generic identifiers in the tag set (a duplicate of this file is provided under the name teiwdgis.dtd) In addition, five sample writing system declarations are also being released: iso646ir.wsd, which documents the international reference version of ISO 646: 1991 (which is identical to ANSI X3.4, better known as ASCII) iso646ss.wsd, which documents the non-national subset of ISO 646: 1991 iso88591.wsd, which documents the character set of ISO 8859-1, an eight-bit character set for the Latin alphabet with special characters for Western Europe) iso88592.wsd, which documents the character set of ISO 8859-2, an eight-bit character set for the Latin alphabet with special characters for Eastern Europe) teien.wsd a simple WSD for English based on ISO 8859-1 We append the usual information on how to retrieve this chapter, for the convenience of subscribers. -C. M. Sperberg-McQueen Lou Burnard 20 September 1993 ----- Texts of P2 are being made available in a number of different electronic formats. These include plain screen-readable text (filetype DOC), LaTeX (filetype TEX), PostScript (filetype PS) and of course SGML (filetypes P2X and REF). In addition, many chapters define specific DTD files which are released together with the chapter. All files can be retrieved from any of several servers, as described below: 1 Getting files from the server in Chicago To get electronic copies of this fascicle from the TEI-L fileserver at the University of Illinois at Chicago, all you need to do is send an ordinary email note to the address LISTSERV@UICVM (or, from the Internet: listserv@uicvm.uic.edu) containing whichever of the following lines describe the file(s) you want: get p2wd doc get p2wd ps get p2wd p2x get p2wd ref get p2wddriv p2x get teiwsd2 dtd get teiwdgis dtd get iso646ir wsd get iso646ss wsd get iso88591 wsd get iso88592 wsd get teien wsd The DOC and PS files include the complete fascicles; the P2X file contains only this chapter. The file P2WDDRIV P2X is a 'driver file' which embeds the chapter file P2WD P2X and the accompanying reference material, P2WD REF. Further details on the DTD used may be had by contacting the editors. The documents you request will be returned to you automatically as e-mail messages. Beware! some of the files are quite large, and so may be delayed. You will also receive an automatic notification that the file is on its way to you. (If you receive something illegible in a 'Listserv packed format', please contact one of the editors directly to see about getting you the file in a more useful form.) 2 Getting files from the server in the UK The same files are available via anonymous FTP from the SGML Project at the University of Exeter. To access these files, your computer system must be on the InterNet. If it is, you should be able to give the command FTP sgml1.ex.ac.uk [ or FTP 144.173.6.61] When you are connected to the Exeter SGMLbox, type the following commands (or, whichever actually describe the files you want): cd tei/p2/drafts get p2wd.p2x get p2wd.doc get p2wd.ps get p2wd.ref get iso646ir.wsd get iso646ss.wsd get iso88591.wsd get iso88592.wsd get teien.wsd cd ../dtds get teiwsd2.dtd get wdgis2.ent get teiwdgis.dtd (note that the filename *must* be given in all lower-case letters) 3 Getting the files from other servers The files may also be obtained from the Markup-L Listserv fileserver in Germany, and from Professor Syun Tutiya in Japan. For more details on these and other sources of TEI information, please order copies of files EDJ8 MEMO (describes how to retrieve electronic copies of TEI P2 and the various formats in which they are available) EDJ9 MEMO (describes how to request paper copies of TEI P2, for those without electronic mail access) (on the Exeter file server, get file tei/intro/edj8.doc)