========================================================================= Date: Mon, 7 Sep 1992 18:36:14 CDT Reply-To: U59467@UICVM.BITNET Sender: "TEI-L: Text Encoding Initiative public discussion list" From: U59467@UICVM.BITNET Subject: Forwarded note below ======================================================================= Received: from UICVM.BITNET by UICVM (Mailer R2.07) with BSMTP id 1235; Mon, 07 Sep 92 04:29:54 CDT Received: from IRUCCIBM.BITNET by UICVM (Mailer R2.07) with BSMTP id 1225; Mon, 07 Sep 92 04:27:57 CDT Received: from IRUCCVAX.UCC.IE by IRUCCIBM.BITNET (Mailer R2.08) with BSMTP id 1695; Mon, 07 Sep 92 10:27:47 IST Received: from curia.ucc.ie by IRUCCVAX.UCC.IE (PMDF #12095) id <01GOHZPUQ6O0000WSK@IRUCCVAX.UCC.IE>; Mon, 7 Sep 1992 10:15 GMT Received: by curia.ucc.ie (4.1/SMI-4.1) id AA24774; Mon, 7 Sep 92 10:14:40 GMT Date: Mon, 7 Sep 92 10:14:40 GMT From: pflynn@curia.ucc.ie (Peter Flynn) Subject: SGML and TeX To: sgml-l@dhdurz1.BITNET, texhax@tex.ac.UK Cc: tei-l@uicvm.BITNET, uktex@tex.ac.UK Message-id: <9209071014.AA24774@curia.ucc.ie> X-Envelope-to: sgml-l@dhdurz1.BITNET, texhax@tex.ac.UK, uktex@tex.ac.UK, tei-l@uicvm.BITNET TUGboat 13[2] carries an abstract by Reinhard Wonneberger (pp226--227) called "Approaching SGML from TeX", in which he summarises some of the possible ways to use TeX to print from an SGML instance. The following file is an attempt I cooked up over the weekend to demonstrate the feasibility of this approach. It still fails on a lot of things, but they don't look insuperable. The instance referenced at the end of the file can be retrieved by anon ftp from curia.ucc.ie (143.239.1.8) in pub/curia -------------------------- % SGML.TEX --- a pilot set of macros to provide rudimentary % typesetting of SGML-encoded documents with NO % pre- or postprocessing (you better believe it) % (c) 1992 Peter Flynn % % Warning: this file uses the EPLAIN macros of Karl Berry, obtainable % from any of the TeX archives such as tex.ac.uk or ymir.claremont.edu % % WARNING: this is a pilot. No guarantees, but it seems to % work on the tags I mention below. It should form the basis % for much more work, as with proper persuasion, TeX should be % able to process an unaltered SGML instance (and DTD) and % produce a piece of acceptable typesetting (IMHO :-). % % If you are going to do some work on this, please ask me first: % I am unlikely to object, but I would like to know about it. % % Version history: % % 0.1 (Sep 92) reads and acts on a minimal tagset of HTML % used in network-browseable documents by WWW % This comprises (work so far): % % ... Document title %

...

Header level 1 %

...

Header level 2 %

...

Header level 3 %
... Simple list %
...
... Item name, text %
End of list %

Paragraph % some entities like á (see below) % % I haven't figured out how to handle multi-word % tags (eg with attributes) like % yet, because in the parsing, TeX turns the space % into another category of character. Gimme time! % Another source of confusion is the presence of a % slash in a quoted filename within an attribute to % such tags when TeX is looking for the slash which % indicates the endtag. However...:-) % % All comments to pflynn@curia.ucc.ie (Fax: +353 21 277194) \input eplain % get it from the archives! \font\stt=cmtt8 % used for the tags \font\sbf=cmssbx10 scaled \magstep1 % used for the title \font\sc=cmcsc10 % used for some headers % Make a slash an ordinary letter. \catcode`\/=11 % Define \pos, the position in a tag of the slash character % and \slash, a flag, 0=no slash found, 1=slash found. \newcount\pos\newcount\slash % The \parse and \getchar are adapted from the \length macro % at the end of Chapter 20 (p.219) of the TeXbook. A call to % \parse returns \slash=0 or \slash=1 depending on whether % the argument was a starttag or endtag. \def\parse#1{\global\pos=0\global\slash=0\getchar#1/} \def\getchar#1{\ifx#1/\ifnum\pos=0\global\slash=1\global\advance\pos by1\let\next=\getchar\else\let\next=\relax\fi% \else\global\advance\pos by1\let\next=\getchar\fi\next} % Use \raggedcenter from Appendix A 14.34 (p.317) of the TeXbook \def\raggedcenter{\leftskip=0pt plus12em \rightskip=\leftskip \parfillskip=0pt \spaceskip=.3333em \xspaceskip=.5em \parindent=0pt \pretolerance=9999 \tolerance=9999 \hyphenpenalty=9999 \exhyphenpenalty=9999 } % Define the visual meanings to be attached to the tags \def\title{\par\begingroup\raggedcenter\sbf} \def\/title{\bigskip\endgroup} \def\p{\par} % Header level tags have to go in a group so that digits can % be treated as letters for purposes of definition. \begingroup\catcode`\2=11\catcode`\1=11 \global\def\h1{\bigbreak\noindent\begingroup\bf} \global\def\/h1{\endgroup\medskip\noindent\ignorespaces} \global\def\h2{\medbreak\noindent\begingroup\sc} \global\def\/h2{\endgroup\smallskip\noindent\ignorespaces} \global\def\h3{\smallbreak\noindent\begingroup\sl} \global\def\/h3{\endgroup\par\noindent\ignorespaces} \endgroup \def\dl{\unorderedlist} \def\/dl{\endunorderedlist} \def\dt{\li\it} \def\dd{\item{}\rm} \def\a #1{\footnote{#1}} \def\/a{} \def\entr{\item{$\bullet$}} % Make the less-than (opentag) character active, and establish % two controls to let the use turn on tag presence and formatting % in the output. Default is no tags and no formatting: this will % output pages of plain typewriter text. Saying \showtagstrue % will include the tags in the output; saying \formattrue will % perform the formatting defined above. Either or both can be % used, but must be inserted where shown below, before the \input. \catcode`\<=\active \newif\ifshowtags\newif\ifformat % Define the main routine to handle a tag \def<#1>{\parse{#1}\ifnum\slash=1\ifshowtags\endtag{#1}\fi \ifformat\csname#1\endcsname\fi \else\ifformat\csname#1\endcsname\fi \ifshowtags\starttag{#1}\fi\fi} % Set up some variable to handle the boxing of tags for output \newbox\tagbox\newdimen\tagwidth\newdimen\boxwidth \def\hlinefill{\leaders\hrule height.2pt\hfill} % Define what a starttag looks like \def\starttag#1{\setbox\tagbox=\hbox{{\stt#1}}% \tagwidth=\wd\tagbox\advance\tagwidth by2pt% \boxwidth=\tagwidth\advance\boxwidth by4pt% \leavevmode\lower2.5pt\hbox{\vrule width.2pt\vbox{\hsize=\boxwidth\parindent=0pt \offinterlineskip% \line{\hbox to\tagwidth{\hlinefill}\hfil}% \line{\hskip2pt\box\tagbox\kern-.5pt$\rangle$\hfil}% \line{\hbox to\tagwidth{\hlinefill}\hfil}}}} % Define what an endtag looks like \def\endtag#1{\setbox\tagbox=\hbox{{\stt#1}}% \tagwidth=\wd\tagbox\advance\tagwidth by2pt% \boxwidth=\tagwidth\advance\boxwidth by4pt% \leavevmode\lower2.5pt\hbox{\vbox{\hsize=\boxwidth\parindent=0pt\offinterlineski p% \line{\hfil\hbox to\tagwidth{\hlinefill}}% \line{\hfil$\langle$\kern-1pt\box\tagbox\hskip2pt}% \line{\hfil\hbox to\tagwidth{\hlinefill}}}\vrule width.2pt}} % Define some of the simpler entities \def\aacute{\'a} \def\eacute{\'e} \def\iacute{\'{\i}} \def\oacute{\'o} \def\uacute{\'u} \def\ocus{\&} \def\amp{\&} \def\nodoti{\i} \def\aelig{\ae} \def\mdash{---} % Turn on the recognition of the ampersand so entities become active \catcode`\&=\active \def{\csname#1\endcsname} % Slip in recognition of a few of TeX's special characters % The % sign itself is done only later, immediately before % inputting the SGML instance, so that we can continue using % comments until then. \catcode`\$=\active\def${\$} \catcode`\#=\active\def#{\#} % Uncomment your choice of options here \showtagstrue \formattrue % Make some assumptions about the style of output, based on the above: \ifshowtags\raggedright\else\fi \ifformat\else\ttraggedright\fi \tolerance=7500 % And define the double-quote (") as active so typewriter-style % quotes come out as open-and-closed in flip-flop manner. Bad style % to use them in SGML anyway, ... is better :-) \ifformat\newcount\qcount\catcode`\"=\active \def"{\global\advance\qcount by1\ifodd\qcount``\else''\fi}\fi % Input your SGML instance here, after the comment character % is redefined (no more comments from here on... \catcode`\%=\active\def%{\%} \input /info/curia/Chron_Scot.html \bye ---------------------------------------------------------- ========================================================================= Date: Tue, 8 Sep 1992 22:18:13 CDT Reply-To: U59467@UICVM.BITNET Sender: "TEI-L: Text Encoding Initiative public discussion list" From: U59467@UICVM.BITNET Subject: Forwarded note below ======================================================================= Received: from UICVM.BITNET by UICVM (Mailer R2.07) with BSMTP id 5072; Tue, 08 Sep 92 10:34:28 CDT Received: from UICVM by UICVM (Mailer R2.07) with BSMTP id 4989; Tue, 08 Sep 92 10:33:00 CDT Received: from ccvr1.cc.ncsu.edu by UICVM.UIC.EDU (IBM VM SMTP V2R1) with TCP; Tue, 08 Sep 92 10:32:56 CST Received: from essss1.stat.ncsu.edu by ccvr1.cc.ncsu.edu (5.65b/SAM Thu May 30 15:11:36 EDT 1991) id AA13328; Tue, 8 Sep 92 11:33:38 -0400 Posted-Date: Tue, 8 Sep 92 11:36:55 EDT Received: from esssta.stat.ncsu.edu by essss1.stat.ncsu.edu (4.1/SMI-4.0) id AA16065; Tue, 8 Sep 92 11:36:55 EDT Date: Tue, 8 Sep 92 11:36:55 EDT From: arnold@stat.ncsu.edu (Tim Arnold) Message-Id: <9209081536.AA16065@essss1.stat.ncsu.edu> Received: by esssta.stat.ncsu.edu (4.1/SMI-4.0) id AA00843; Tue, 8 Sep 92 11:36:50 EDT To: tei-l@uicvm.uic.edu Subject: Waldt reference, please I got the following paragraph from the SGML bibliography, and I would very much like to read Waldt's article, but I don't see the name of publication. Can someone let me know where this article (or a similar one) can be found? ----------------------------- > These SGML-aware editors, transducers, translators and other > facilities are numerous, and in general could not be evaluated here. > For a summary of some SGML-aware editors, see: Dale Waldt, "Overview > of SGML-Smart Text Editors," 17 (December 1990) 12-15; he > reviews IBM TextWrite; Datalogics WriterStation; SoftQuad > Author/Editor; Yard Software Write-It; Software Exoterica CheckMark). ----------------------------- Thanks in advance, --Tim Arnold ---------------------------------------------------------------------- Tim Arnold Instructional Computing Internet: arnold@stat.ncsu.edu North Carolina State Univ. BITNET : ARNOLD@NCSUSTAT Dept. of Statistics, Raleigh NC 27695 Phone : 919 515 2584 FAX: 919 515 7591 ---------------------------------------------------------------------- ========================================================================= Date: Wed, 9 Sep 1992 16:45:39 CDT Reply-To: "Brian E. Travis" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "Brian E. Travis" Subject: Re: Forwarded note below This message came directly to the TEI-L List Moderator account, and is being forwarded from there. -Eds. ----------------------------Original message---------------------------- > I got the following paragraph from the SGML bibliography, and I > would very much like to read Waldt's article, but I don't > see the name of publication. Can someone let me know where > this article (or a similar one) can be found? > ----------------------------- > > These SGML-aware editors, transducers, translators and other > > facilities are numerous, and in general could not be evaluated here. > > For a summary of some SGML-aware editors, see: Dale Waldt, "Overview > > of SGML-Smart Text Editors," 17 (December 1990) 12-15; he > > reviews IBM TextWrite; Datalogics WriterStation; SoftQuad > > Author/Editor; Yard Software Write-It; Software Exoterica CheckMark). > ----------------------------- The article was in , The SGML Newsletter, Issue 17 (December 1990). Contact GCA at 703-519-8157 for back issue information. Brian. -- Brian E. Travis brian@sgmlinc.com SGML Architect, Managing Editor, Tele: +1 303 680-0875 InfoDesign Corp. The SGML Newsletter Fax: +1 303 680-4906 ========================================================================= Date: Thu, 10 Sep 1992 12:47:23 CDT Reply-To: "Wendy Plotkin (312) 413-0331" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "Wendy Plotkin (312) 413-0331" Subject: SGML Software for Macintosh ----------------------------Original message---------------------------- [The summary below was prepared by Professor Robert Jones of the University of Illinois (Champaign-Urbana) Sociology Department. Professor Jones is involved in converting the works of sociologist Emile Durkheim into machine-readable form, and is also director of the Hypermedia Laboratory of the U. of I.'s College of Liberal Arts and Sciences. Professor Jones prepared this for those who attended the first seminar on textual analysis of the Center for Electronic Texts in the Humanities in August. A summary of the seminar will be posted soon on TEI-L.- Wendy Plotkin, TEI Research Assistant] ========================================================================= From: IN%"bob_jones@howl.las.uiuc.edu" "bob jones" 28-AUG-1992 10:17:09.78 To: IN%"ceth@zodiac.rutgers.edu" Subj: SGML tools on Mac After talking with our local Apple folks, I was able to obtain a list of SGML tools for the Mac. ------------------- The following is a list of companies with SGML products on the Mac. Author/Editor is a context sensitive text entry system which provides SGML compliance with the FIPS standard. Author/Editor creates machine independent ASCII text whose tags work in collaboration with other publishing or database software. Author/Editor features include the ability to use SGML to compose an outline of a document as the first step, fill out documents through the outline, produce documents that match the style and format of other documents, and create documents that can be stored, indexed, and retrieved with complete flexibility. Rules/Builder: An application for creating and storing formatting specifications, allows users to create DTD and a compiled Rules file. SoftQuad Publishing v2.9: SoftQuad Publishing Software for Macintosh running A/UX is an automated, batch text and graphics formatter with tools for the creation of complex tabular material, mathematics, graphs, and charts. Its major strength is in the creation of long documents on Laser Printers and typesetting machines. Running under A/UX, SoftQuad Publishing Software provides all the capabilities required for high-end production publishing, including automatic kerning, hyphenation using an exception dictionary, capacity for multiple columns and English language names for macros and commands. Simple tools allow the creation of new macro formatting packages including a trace mechanism, complete compatibility with the old troff and DWB, and support spot color separation. SoftQuad, Inc. 56 Aberfoyle Crescent Suite 800 Toronto, Canada M8X2W4 (416) 239-4801 XGML Engine: A sophisticated validating SGML parser. The engine uses state-of-the-art compiler technology to provide application developers with efficient and accurate SGML parsing capability. The engine is made unique by the application Developers Interface (API) which supports the development of any SGML application without modifying the engine source code. XGML CHECK MARK A complete SGML author, editor, parser and validation solution for the Macintosh. Specially designed for retrofitting older documents to SGML. CheckMark users can enter tags in context, either from menus or directly from the keyboard. The menu is helpful to users who are new to SGML or are working with a particularly complex DTD. It provides a full listing of permissible tags at any point in a document. XGML Translator A fourth-generation conversion language that prepares SGML documents for use by text formatters and database systems. Also translates formatted and OCR-scanned documents to SGML or any other markup language. Can be used to generate Hypercard stacks from SGML documents for value added document delivery and easy navigation through documents. XGML OmniMark A powerful and easy to learn scripting language combined with a premium validating SGML parser. OmniMark can be used to convert SGML documents to the input languages of other products, to convert the output languages of other products to SGML-defined languages including AAP and CALS, and to convert between arbitrary languages and data formats. Exoterica Corporation 383 Parkdale Ave., Suite 406 Ottawa, Ontario K1Y 4R4 (613) 722-1700 MARKUP Markup is a toolkit of products that meet the CALS SGML requirements, including Author/Editor and Check Mark. IGES and CGM conversion utilities are also available. Teleprint works with customers to select and customize the products that meet the customers individual SGML needs. Markup also provides string and character replacement tools with sophisticated search and replace for formats that can convert files such as Ventura and Pagemaker into rough SGML files. MCA Associates Teleprint 102 Inverness Terrace East Englewood, CO 80112 (303) 947-2751 Context-Wise A sophisticated SGML application designed with powerful document interpretation tools that were especially designed to handle hierarchical, nested text structures found in SGML documents. Context-Wise provides a broad array of user-definable wild-card sets for advanced document tagging. USLynx 853 Broadway New York, New York 10003 (212) 533-7331 ========================================================================= Date: Sat, 12 Sep 1992 08:21:28 CDT Reply-To: "Wendy Plotkin (312) 413-0331" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "Wendy Plotkin (312) 413-0331" Subject: CETH Seminar in Textual Analysis ----------------------------Original message---------------------------- A Report on CETH Seminar on Textual Analysis Princeton University August 9-21, 1992 TEI was prominently featured at the first seminar on textual analysis sponsored by the Center for the Electronic Texts in the Humanities (CETH). CETH was established in late 1991 by Rutgers and Princeton Universities to act as a central organization to assist in the creation, dissemination and use of electronic texts in the humanities. In addition to creating an inventory of machine-readable texts and making them available through the Internet, the Center is committed to offering educational seminars on various aspects of electronic texts. The two instructors were Susan Hockey, CETH Director, and Willard McCarty, Assistant Director of the University of Toronto's Centre for Computing in the Humanities. Susan chairs the TEI Steering Committee, and was formerly the Director of the United Kingdom's Computers in Teaching Initiative (CTI) Centre for Textual Studies, located at Oxford University. Willard is a member of the TEI Verse work group, the founding editor of the _Humanist_, and is currently working in the area of classical studies, in particular on Ovid's _Metamorphoses_. An international group of librarians, literary, linguistic and social science scholars, and computer and information scientists comprised the class. Librarians and library graduate students from the Association of Research Libraries and from universities at Arizona State, Columbia, Indiana, Iowa, Manitoba, Maryland, NYU, Princeton, Rutgers, Texas, and Wesleyan attended. Literary scholars and students from Spain, Virginia, New York State and Wooster, Ohio ranged in their specialties from Old English to _Piers Plowman_ and modern English and Russian fiction. Linguistic scholars from England and Canada were working in computational linguistics and discourse analysis. Social scientists from Israel, Missouri, and Illinois brought backgrounds in the history of Judaism and Zionism, Sri Lanka, modern Western social theory, and U.S. urban development; a Princeton art historian, in the Princeton Cyprus expedition. Computer scientists and a mathematician from Rutgers and Wisconsin brought a familiarity with higher level programming techniques and interests in analyzing literature. The seminar provided historical information on electronic texts, including their development in the U.S., Europe, and elsewhere. Existing resources such as ARTFL, the Dante Database, the Thesaurus Linguae Grecae, and the Oxford English Dictionary Version II were described. Robert Hollander, Professor of Comparative Literature at Princeton and Dante Database creator, demonstrated the Dante. Toby Paff and Hannah Kaufmann of Princeton's Humanities Computer Center demonstrated ARTFL and the OED2. The need for additional effectively structured online dictionaries was expressed. Other electronic texts were made available for individual perusal (Intelex's _Pastmasters_, Georgetown's _The Phenomenology of the Mind_). Each of these pioneer projects includes textual analysis software with which to analyze the text; they are not aimed at the casual browser, in part due to copyright restrictions. A number of issues were identified as of continuing concern: the need for collaboration in the creation of electronic texts, ample space for their storage, their easy retrieval, and widespread access to texts. Better user interface, improved presentation of individual and parallel texts, hypertext (see below), and dynamic, graphic displays were also deemed desirable. Susan and Willard reviewed two textual analysis programs, one public domain and the other proprietary -- TACT and MICRO-OCP. Their common features include the creation of alphabetical frequency lists of all words, concordances (all the occurrences of a word or phrase, in context), and collocations (co-occurrences of words and phrases). Susan described several studies using stylistic analysis -- Mosteller's and and Wallace's _The Federalist Papers_, Morton's study of Greek texts and their disputed authorship by St. Paul, Kenny's work on _The Aristotelian Ethics_, and Burrows on Jane Austen. We also explored the statistical tests used to summarize the findings in these studies. Beyond stylistic analysis, we looked at linguistic and lexical analysis. Means of using TACT to undertake simple analyses of this type were described. Linguistic and lexical analysis are important for studying language and developing printed and electronic dictionaries. Of even greater significance are their potential for improving information retrieval. As the rules of language are systematized in a manner that computers can understand, computers can apply these rules in interpreting new textual material. The complexity of the task was revealed in the demonstration of a program to automatically parse several sentences. It was successful with one sentence, but completely fell apart when faced with a particularly ambiguous phrase. (By the end of the workshop, we all were freely talking about the difficulty of "disambiguating" words.) Much additional development in the area of automated recognition and analysis of "fuzzy" matches, names, concept relations and figures of speech such as metaphors was desired. Computer assistance in creating critical editions was explored. Those interested in this topic had the opportunity to try out the Collate program prepared by the chair of the TEI Text Criticism work group, Peter Robinson. Susan presented the TEI to the participants, many of whom were familiar with its general principles. TEI's advantages were described as its transportability across different platforms, ease of sharing texts and their analyses, and superior analytical tools. Some of those present expressed reservations about the labor intensiveness of marking up texts, and the desire to analyze a "clean" text free of the interpretation implicit in any mark-up system. A major constraint is the lack of existing software with which to ease the mark-up process and to exploit the mark-up for analysis. Such software is being developed or is used for selective applications. PAT takes advantage of the OED's SGML mark-up, while Dynatext, which was demonstrated to the group, uses SGML to create the links in its hypertext electronic books. These applications are presently too limited or too expensive for general use, and much additional effort is needed in this area. In spite of the reservations expressed, the need for a standard means of encoding and sharing texts seemed to be accepted. About half of us had brought texts to analyze using these tools. Afternoons and late evenings in the dormitory basement were devoted to this task. Texts treated included the poetry of Canadian Margaret Avison, "Piers Plowman" (B), Shakespeare's tragedies, "My Dinner With Andre," classified ads from modern British newspapers, English translations of French and Egyptian fiction, 15th Century Russian chronicles, Durkheim's works, the diary of Robert Knox (a 17th century British sea captain's son imprisoned on Ceylon), andd an early issue of _The Catholic Worker_, a progressive activist Catholic newspaper. One student created a program for Latin morphological analysis (and taking a cue from Julius Caesar, proposed that Latin be adopted as Europe's common language). The projects aptly demonstrated the challenges involved in analyzing electronic texts. In some cases, the difficulty lay in creating or obtaining access to an electronic text. Several attempts at scanning were unsuccessful, especially on older books such as the _History of the British Royal Society_. Where technology was not a problem, obtaining publishers' approval to convert copyrighted texts such as _The Book of Mormon_ and _Lolita_ was. Stylistic analysis required choosing characteristic features of style. For example, Nabokov's language in _Pale Fire_ was compared to the poetry of Alexander Pope and Robert Frost which it parodied, raising questions about the appropriateness of semantic and lexical analysis as the basis of comparison. An interesting study of professional and amateur English translations of French authors Theophile Gautier and Eugene Sue for "stylistic fingerprints" included too small a sample from which to draw conclusions. A successful outcome occurred when a student studying Egyptian short stories and their translations found that TACT and Micro-OCP speeded up the analysis he had begun years before without these tools. Conceptual and thematic analysis called for hard decisions about the relationship between complicated concepts and words and brief phrases (the basic units of TACT and Micro-OCP). A study of the use of the words "sin", "redemption," and "atonement" in _The Book of Mormon_ revealed interesting information about the Mormons' connection of these concepts. Willard's description of his work with Ovid's _Metamorphoses_ augmented by his explanatory article in the _Tact Exemplar_ demonstrated how TACT could be used to unveil important themes. Although the results of these analyses were quite interesting, the need for additional development of analytical and auxiliary tools was widely agreed upon. Don Walker, a member of the TEI Steering Committee and chair of the Association for Computational Linguistics (ACL), described the extensive work being done worldwide with electronic texts, especially in linguistics and lexical analysis. The ACL Data Collection Initiative is amassing electronic transcriptions of written and spoken English, a portion of which is available on CD-ROM. The Network of European Corpora is developing standards to guide the individual European nations in the creation of language corpora. The Consortium for Lexical Research and Linguistic Data Consortium have formed to enhance cooperation among the many projects in progress. Hypertext, the newest frontier in electronic texts, was discussed and debated. Its advantages in integrating different sources of information were acknowledged. Its effect on the behavior of student and scholar are not yet understood, however. Will it stimulate the student or scholar to investigate sources other than those included in the hypertext package, or create the perception that the most important sources are included in the package? Elli Mylonas, chair of the TEI Performance Texts work group, gave a presentation on Pandora, a new text retrieval program she and others have developed to search the Thesaurus Linguae Grecae, and on Perseus. Perseus, developed by a consortium of universities and located at Harvard, is a multi-media educational Macintosh product that includes Greek/English texts from the classical period, a Greek/English lexicon, a classical encyclopedia, and a wealth of photographs of artifacts and sites. Elli also demonstrated two types of electronic hypertext fiction. The first type is represented by Voyager Company's books, which tend to treat text in a traditional manner, although they include analytical and note-taking tools for those who want to analyze Sara Paretsky and the like. The second, Story Space fiction, was created explicitly for the electronic medium and uses the interweaving allowed by hypertext as part of its literary strategy. Ann Okerson of the Association for Research Libraries (ARL) described ARL's efforts in exploring the extent and advantages of electronic journals, newsletters, and bulletin boards. Scholarly communication has speeded up with the advent of the computer, and collaboration has become a greater possibility with the ease of the electronic medium. The ARL has produced the "Directory of Electronic Journals, Newsletters and Academic Discussion Lists," and Ann described her greater appreciation of publishers' efforts after completing this project. Finally, Andreas Bjorklind of Sweden offered a presentation on Wide Information Area Servers (WAIS), the new communication system which allows individuals to search and retrieve electronic databases across the world. The seminar offered a great variety of information about electronic texts and textual analysis, as well as a relaxed setting in which to study. It was enlightening to learn how many universities and libraries are already involved in offering and analyzing electronic texts. Many of the scholars attending were involved in establishing humanities computing centers or services within their institutions, libraries or departments. The professional and personal relationships established, the understanding gained of textual analysis techniques, and the appreciation of the need for additional hardware and software for more sophisticated analysis were the highlights of the session. ========================================================================= Date: Sat, 12 Sep 1992 16:14:47 CDT Reply-To: "Wendy Plotkin (312) 413-0331" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "Wendy Plotkin (312) 413-0331" Subject: CETH Seminar Summary: Correction ----------------------------Original message---------------------------- Apologies to James Campbell, Chair of the Electronic Information Services at the University of Virginia's Alderman Library*, for my leaving out mention of him in the list of libraries represented. Jim contributed a great deal to the workshop through his familiarity with available electronic textual resources, and showed a strong interest in the TEI. *Jim is also North Europe Bibliographer. ========================================================================= Date: Mon, 14 Sep 1992 15:05:05 CDT Reply-To: Elli Mylonas Sender: "TEI-L: Text Encoding Initiative public discussion list" Comments: W: Field "From:" duplicated. Last occurrence was retained. From: Elli Mylonas Subject: SGML tools ----------------------------Original message---------------------------- I thought it might be good to make some clarifications to the list of SGML tools. I have used several of these, and can speak highly of all the ones I know. However, it is only fair to point out that some of the tools listed are obsolete and no longer supported. The two Exoterica programs, CheckMark and Translator (fondly known to its users here at Perseus as The Mangler) are no longer supported produucts. Nothing has taken the place of CheckMark, and Exoterica may still distribute it. The Translator has been replaced by OmniMark. A few words about these programs: CheckMark still works very well under System 7, and zips along nicely on a Quadra. It does crash intermittently, and has a terrible time dealing with anything but a 13 inch monitor. If you have two monitors it opens files off the right hand edge, and you have to haul them back over using the few visible pixels. I should add that it is our Validator of choice, and we are well able to live with its problems considering how nicely it works, and that it is a texxt editor as well. Translator did not run under Multifinder, it only ran in Unifinder, in the Aztec C shell. That meant that, to run it, one had to write little pseudo-Unix shell scripts in order to do batch jobs. Or use the command line. It was also quite pokey. OmniMark is faster, runs under Multifinder and System 7, although not in the background, but does not have a significantly improved interface. Actually, it has not interface at all. It runs in the generic LightSpeed C dialog box command line. This means that you click on the program, type in the command with arguments, parameters and flags, and press return. Then the console window comes up. If all is well, the program runs, and you are told by LightSpeed C to press return to exit the console window. This quits the program. I was stymied for a while for how to run batch jobs, and also to avoid typing the huge command lines the programs require. Then the perfect solution appeared: we have been using UserLand Frontier for batch jobs with other programs. We now use Frontier, to make up the parameter strings, open the OmniMark programs, then it used IAC to send messages to QuickTime, which types the parameter string into the program, and presses return when it is done. Rube Goldbergesque, but it works. I can convert a folder full of ancient orators (several MB at a time) in maybe a tad more than an hour, and i am reading and writing over the network (using the Quadra). Not bad. As for Author/Editor from SoftQuad, and its Rules Builder side kick, apparently SoftQuad are working on a new version. I would wait for that. Author/Editor is a wonderful SGML validator and editor, but it was quirky over the network (specifically, Don't use it over TOPS!!) and the new version promises to be really far superior. NB i am not sure the other programs are for the Mac. Also the XGML Engine is a developer tool. sorry to go on so long --Elli Mylonas ========================================================================= Date: Mon, 14 Sep 1992 15:06:35 CDT Reply-To: "Wendy Plotkin (312) 413-0331" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "Wendy Plotkin (312) 413-0331" Subject: Word Perfect/SGML ----------------------------Original message---------------------------- [Roy Flannagan of the Electronic Milton Project has sent us the following message to distribute to TEI-L Readers:] Subject: "WP MarkUp," an SGML translation program, now only for Sun SPARCstations and in beta versions {WordPerfect Report} for Fall 1992 announces WordPerfect MarkUp, an aid for inserting markup codes into documents "so that they are compatible with the SGML format." If WP MarkUp doesn't recognize the file format in your document, it will use its conversion program to convert the document to WP 5.1, then retrieve the document into WP MarkUp. "WP MarkUp will then perform the translation to SGML-- and for substantially less cost than other SGML programs currently on the market." WP MarkUp will also (theoretically) catch pre-tagging mistakes and point them out with something called an Interactive Validator, "so that you can correct them." "WP MarkUp is currently in beta testing and will be available on Sun SPARCstations by the end of the year. The DOS version of this product is currently under development." Roy Flannagan Ohio University ========================================================================= Date: Wed, 16 Sep 1992 17:01:13 CDT Reply-To: "Wendy Plotkin (312) 413-0331" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "Wendy Plotkin (312) 413-0331" Subject: Welcome to Subscribers Letter ----------------------------Original message---------------------------- We have recently updated the letter welcoming new TEI-L subscribers, including information about how to obtain documents from the TEI-L and the Markup-L (Germany) fileservers, by anonymous FTP from the United Kingdom, and from Japan. The letter also includes the names and affiliation of the TEI Steering Committee members and Editors. Most of this information is included in TEI EDJ8, TEI J16, and the cover notes accompanying release of TEI P2 fascicles, but we thought you might be interested in knowing about the letter. If you are interested in obtaining a copy of the letter, send a note to Listserv@UICVM or Listserv@uicvm.uic.edu with the message: Get Welcome Doc We have also added more complete descriptions of the available files on the TEI-L Filelist, obtained by sending a note to Listserv@UICVM or Listserv@uicvm.uic.edu with the message: Get TEI-L Filelist Please let me know if you experience any problems in obtaining any files. ========================================================================= Date: Sat, 19 Sep 1992 19:33:06 CDT Reply-To: southerl@acs.ucalgary.ca Sender: "TEI-L: Text Encoding Initiative public discussion list" From: southerl@acs.ucalgary.ca Subject: Mystery Reference ----------------------------Original message---------------------------- I came across the following in the References in Karin Aijmer and Bengt Altenburg, (Eds.), _English Corpus Linguistics_: (1) Du Bois, J. W., Schuetze-Coburn, S., Paolino, D., and Cumming, S. (forthcoming a), _Discourse Transcription_; (2) Du Bois, J. W., Schuetze-Coburn, S., Paolino, D., and Cumming, S., (forthcoming b), 'Outline of discourse transcription', in Edwards, J. A., and Lampert, M. D., (Eds.), _Transcription and Coding Methods for Language Research_, Lawrence Erlbaum, Hillsdale, NJ. I can find no mention of either (if indeed there are two of them) in Books in Print and wonder if anyone has any info as to the existence or availability of them/it. Thanks. -- ========================================================================= Date: Mon, 21 Sep 1992 11:25:30 CDT Reply-To: Stig Johansson Sender: "TEI-L: Text Encoding Initiative public discussion list" From: Stig Johansson Subject: ref to Du Bois et al ----------------------------Original message---------------------------- The two items mentioned in a recent note to TEI-L ("Mystery reference") do indeed exist, although I have not seen them in published form yet. The first appeared in mimeographed form a couple of years ago. The second is an article to appear in a forthcoming book. For more details on publication, contact John Du Bois, Dept of Linguistics, Univ of California at Santa Barbara and Jane Edwards, Institute of Cognitive Studies, University of California at Berkeley. Stig Johansson Oslo ========================================================================= Date: Tue, 22 Sep 1992 09:47:40 CDT Reply-To: "Patrick Stickler" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "Patrick Stickler" ----------------------------Original message---------------------------- Does anyone know of a set of (perhaps publicly declared) entity names for IPA phonetic characters? I would assume that this would be within the scope of interest for TEI, but have not seen anything in either versions 1 or 2 of the guidelines. We're currently using our own in-house set of entity names, but would much prefer to adopt a more standard set (if there exists one). Any information or suggestions will be greatly appreciated. ////////////////////////////////////////////////////////////////////////// Patrick M. Stickler (psti@wsoy.fi) The comments contained herein WSOY, Information Systems Division do not necessarily reflect the Bulevardi 12, 00121 Helsinki Finland official views of WSOY. ////////////////////////////////////////////////////////////////////////// ========================================================================= Date: Wed, 23 Sep 1992 07:30:19 CDT Reply-To: marchand@ux1.cso.uiuc.edu Sender: "TEI-L: Text Encoding Initiative public discussion list" From: James Marchand Subject: Names of IPA symbols ----------------------------Original message---------------------------- I don't really think there is an official guide, Patrick, but you might like to look at Geoffrey K. Pullum and William A. Ladusaw, Phonetic Symbol Guide (Chicago: University of Chicago Press, 1986; ISBN 0-226-68532-2). Being cheap, I only use paperbacks. Most of the terms they use are the "standard" ones; at least they offer a guide. Cf. also the names given in the WordPerfect character set. For example, the Gothic letter for the hw sound, which is an O with a dot in it is represented by them as a theta, and they call it "H-V Ligature," whereas it is more often called the "Collitz letter," having been invented by the linguist Collitz. Their representation of it is poor, since it would not work (poor side-bindings) in type-setting. Nemo sine crimine. ========================================================================= Date: Wed, 23 Sep 1992 14:29:58 CDT Reply-To: "Wendy Plotkin (312) 413-0331" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "Wendy Plotkin (312) 413-0331" Subject: TEI and IPA Entities ----------------------------Original message---------------------------- In response to Patrick Stickler's question about publicly declared entity names for IPA phonetic characters, I would call your attention to the working paper authored by TEI Character Set chairman, Harry Gaylord, "Character Entities and Public Entity Sets" (TEI TR1 W4). This paper includes a discussion of IPA phonetic entities and includes two related appendices: 2.4 TEI IPA symbols for interchange 3.2 IPA Writing System Declaration Obviously, any proposed TEI schemes are drafts at this point, and we welcome comments on their recommendations. To obtain the entire working paper, including all appendices (with entity sets for Greek, Latin, and other languages), send a note to Listserv@UICVM or Listserv@uicvm.uic.edu with the message: Get TR1W4 Package To obtain just the working paper and the two IPA-related appendices, send a note to Listserv@UICVM or Listserv@uicvm.uic.edu with the three messages: Get TR1W4 TEI1 Get TEIIPA ENTITIES Get TEIPHON WSD The text of the working paper, TEI1W4 TEI1, is only available with TEI mark-up in its electronic form. Paper copies without mark-up are available upon request from the TEI. ========================================================================= Date: Fri, 25 Sep 1992 10:02:08 CDT Reply-To: anderson@sapir.cog.jhu.edu Sender: "TEI-L: Text Encoding Initiative public discussion list" From: anderson@sapir.cog.jhu.edu Subject: Re: Names of IPA symbols In-Reply-To: Your message of "Wed, 23 Sep 92 07:30:19 CDT." <9209231229.AA18517@sapir.cog.jhu.edu> ----------------------------Original message---------------------------- For the past several weeks, there has been a discussion on the UseNet newsgroup sci.lang of a proposal for ascii encoding of the IPA (actually IPA with a variety of more or less ad hoc "enhancements", as usual in such proposals, biased heavily toward the languages with which the discussants are most familar). For further information, subscribe to sci.lang, or else contact Evan Kirshenbaum HP Laboratories 3500 Deer Creek Road, Building 26U Palo Alto, CA 94304 kirshenbaum@hpl.hp.com (415)857-7572 --Steve Anderson ========================================================================= Date: Fri, 25 Sep 1992 14:02:35 CDT Reply-To: Glenn Adams Sender: "TEI-L: Text Encoding Initiative public discussion list" From: Glenn Adams Subject: Names of IPA symbols In-Reply-To: anderson@sapir.cog.jhu.edu's message of Fri, 25 Sep 1992 10:02:08 CDT <9209251544.AA12467@sapir.metis.com> ----------------------------Original message---------------------------- Date: Fri, 25 Sep 1992 10:02:08 CDT From: anderson@sapir.cog.jhu.edu For the past several weeks, there has been a discussion on the UseNet newsgroup sci.lang of a proposal for ascii encoding of the IPA (actually IPA with a variety of more or less ad hoc "enhancements", as usual in such proposals, biased heavily toward the languages with which the discussants are most familar). For further information, subscribe to sci.lang, or else contact Why would anyone want to create a new character set for IPA? ISO10646 already provides a full encoding of IPA. Glenn Adams ========================================================================= Date: Wed, 30 Sep 1992 13:08:20 CDT Reply-To: "Wendy Plotkin (312) 413-0331" Sender: "TEI-L: Text Encoding Initiative public discussion list" From: "Wendy Plotkin (312) 413-0331" Subject: New U.S. postal address ----------------------------Original message---------------------------- The university has informed us that our postal address will change, is changing, or has already been changed. (No effective date was named.) The new address is: Computer Center (M/C 135) 1940 W. Taylor St. Room 124 Chicago, IL 60612-7352 The change applies, obviously, to mail sent to C. M. Sperberg-McQueen, David Stanfield, Leon Kubacki and to me. Thanks for changing your records.