The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains a technical standard, a wiki and a toolset.
The TEI Guidelines
The TEI Guidelines, which collectively define an XML format, are the defining output of the community of practice. The format differs from other well-known open formats for text (such as HTML and OpenDocument) in that it's primarily semantic rather than presentational, the semantics and interpretation of every tag and attribute are specified. Some 500 different textual components and concepts (word,[1] sentence,[2] character,[3] glyph,[4] person,[5] etc.); each is grounded in one or more academic discipline and examples are given.
Technical Details
The standard is split into two parts, a discursive textual description with extended examples and discussion and set of tag-by-tag definitions. Schema in most of the modern formats (DTD, RELAX NG and W3C Schema) are generated automatically from the tag-by-tag definitions. A number of tools support the production of the guidelines and the application of the guidelines to specific projects.
A number of special tags are used to circumvent restrictions imposed by the underlying Unicode; glyph to allow representation of characters that don't qualify for Unicode inclusion[6] and choice to allow overcome the required strict linearity.[7]
Many users of the format don't use the complete range of tags but produce a customisation with a subset of the of the tags. The format supports this by grouping tags into sets, each corresponding to a chapter in the TEI guidelines and a group of related academic discplines. Some users of the format go further and describe a schematron stylesheet embodying their local house style to make publishing the content easier.
Examples
Prose tag
TODO pick an example from http://www.tei-c.org/release/doc/tei-p5-doc/en/html/examples-p.html
TODO write text to compare to HTML
Verse
TODO pick an example from http://www.tei-c.org/release/doc/tei-p5-doc/en/html/examples-l.html
Choice tag
The choice tag is used to represent sections of text for which there are more than one possible option. In the following example, based on one in the standard, choice is used twice, once to indicate and original and a corrected year and once to indicate an original and regularised spelling.
<p xml:id="p23">Lastly, That, upon his solemn oath to observe all the above articles, the said man-mountain shall have a daily allowance of meat and drink sufficient for the support of <choice> <sic>1724</sic> <corr>1728</corr> </choice> of our subjects, with free access to our royal person, and other marks of our <choice> <orig>favour</orig> <reg>favor</reg> </choice>.</p>
ODD
The current form of the guidelines is as ODD (One Document Does it all) files, from which documentation in PDF or HTML and schemas in Document Type Definition and XML schema format can be generated. ODD is a modular, allowing groups of features to be included or excluded together, this is is how the widely-used TEI Lite customisation of TEI is built.
TEI customizations
TEI customisations are specialisations of the TEI XML specification for use in particular fields of use or by specific communities.
- EpiDoc (Epigraphic Documents)
- Music Encoding Initiatives
- Charters Encoding Initiative
- Medieval Nordic Text Archive (Menota)
Projects
The format is used by many projects worldwide. Practically all projects are associated with one or more universities. Some well-known projects that encode texts using TEI include:
Project | URL | Strengths |
---|---|---|
British National Corpus | http://www.natcorp.ox.ac.uk | 100 million word snapshot of current English |
Oxford Text Archive | http://ota.ahds.ac.uk/ | Linguistic data |
Perseus Project | http://www.perseus.tufts.edu/ | Greek and Latin texts |
Women Writers Project | http://www.wwp.brown.edu/ | Early modern women writers (Margaret Cavendish, Eliza Haywood, etc.) |
New Zealand Electronic Text Centre | http://www.nzetc.org/ | New Zealand and Pacific Islands texts |
The SWORD Project | http://www.crosswire.org/sword/ | Bible software, dictionaries, Christian literature |
FreeDict | http://freedict.org | Bilingual dictionaries |
Text Creation Partnership | http://www.lib.umich.edu/tcp/ | Early English and American books |
Duke Databank of Documentary Papyri | http://papyri.info/ | Ancient Greek papyrus texts from Egypt |
Henrik Ibsen's Writings | http://www.ibsen.uio.no/ | Complete works and writings by playwright Henrik Ibsen |
History
TODO this entire section needs a clean up
Sponsors and organisation
The scholarly societies originally sponsoring the TEI are the Association for Computers and the Humanities, the Association for Computational Linguistics, and the Association for Literary and Linguistic Computing. These three groups first organized the TEI in 1987 as a research effort funded by grants from several agencies.[8] The Guidelines for Electronic Text Encoding and Interchange[9] were released in 1994, co-edited by Lou Burnard (at Oxford University) and Michael Sperberg-McQueen (then at the University of Illinois at Chicago, later at W3C and now an independent consultant).
Today, the TEI Consortium is a member-funded non-profit corporation hosted by:
- The Research Technologies Service at the University of Oxford,
- the Scholarly Technology Group at Brown University,
- a francophone group comprising ATILF, INIST, and LORIA, co-ordinated at Nancy
- the Institute for Advanced Technology in the Humanities at the University of Virginia.
The TEI started in the 1980s as a consortium of institutions and research projects, maintains and develops a standard for the representation of texts in digital form. Originally sponsored by three scholarly societies based on the manifesto issued after the Vassar Conference,[10][11] the TEI is now an independent membership consortium, hosted by academic institutions in the US and in Europe. Its major deliverable is a set of Guidelines, which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics. Since 1994, these guidelines are a widely-used standard for text materials for performing online research and teaching, and TEI is now the de facto standard for the encoding of electronic texts in the humanities academic community.[12]
The guidelines define some 500 different textual components and concepts (word,[13] sentence,[14] character,[15] glyph,[16] person,[17] etc.), which can be expressed using a markup language and defined by a DTD or XML schema. Early versions of the Guidelines used SGML as a means of expression; more recently XML has been adopted. The basic concepts have been stable for over a decade, with TEI P3 (public release version 3) published in 1994, and updated in 1999. P4 (2002) is a slight update to accommodate XML; TEI P5 was released in November 2007. P5 includes integration with the xml:lang
and xml:id
attributes from the W3C[18] (these had previously been attributes in the TEI namespace), regularisation of local pointing attributes to use the hash (as used in HTML) and unification of the ptr and xptr tags. Together these changes make P5 more regular and bring it closer to current xml practice as promoted by the W3C and as used by other XML variants.
Initially, supporting the character sets required by European and Asian languages was a major issue. This has now been resolved by the use of Unicode, which XML parsers are required to support.[19]
There is ongoing work on TEI P5 which, although it breaks backward compatibility in a number of ways, has significantly updated the inner workings including a reorganization of the underlying structures of elements into classes which allow greater and easier customization. Maintenance and development continue under the sponsorship of the TEI Consortium. The TEI component for marking up feature structures (a model of data sometimes used in linguistics) has been adopted as the basis of the ongoing development of an ISO standard for feature structures.[citation needed]
As of 2011, there is an active proposal to add genetic editing support.[20]
References
- ^ "Element w (word) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-w.html.
- ^ "Element s (s-unit) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-w.html.
- ^ "Element c (character) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-c.html.
- ^ "Element w (character or glyph) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-g.html.
- ^ "Element person (person) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-person.html.
- ^ "Element w (word) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-w.html.
- ^ "Element w (word) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-w.html.
- ^ "Historical background", section iv.2 of TEI P5: Guidelines for Electronic Text Encoding and Interchange.
- ^ "TEI Guidelines". http://www.tei-c.org/Guidelines/. Retrieved 2010-06-18.
- ^ "Closing Statement of the Vassar Planning Conference". http://www.tei-c.org/Vault/SC/teipcp1.txt.
- ^ "Design Principles for Text Encoding Guidelines". http://www.tei-c.org/Vault/ED/edp01.htm.
- ^ See e.g. NEH 2007 Scholarly Editions Grants Guidelines ([1]): "Applicants are encouraged to use open standards and markup conforming to the Text Encoding Initiative (TEI), and to employ current best practices in creation of electronic editions."; JISC catalogue ([2]): "The TEI is the norm for deep text encoding in digital libraries and collections worldwide."; NEH Institutes for Advanced Topics in the Digital Humanities suggested topics ([3]): "Text Encoding Initiative, electronic editing, and publishing".
- ^ "Element w (word) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-w.html.
- ^ "Element s (s-unit) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-w.html.
- ^ "Element c (character) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-c.html.
- ^ "Element w (character or glyph) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-g.html.
- ^ "Element person (person) - TEI P5". http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-person.html.
- ^ http://www.w3.org/TR/REC-xml/
- ^ "2", XML Basics, http://www.xmlnews.org/docs/xml-basics.html, retrieved 2011-07-09
- ^ http://www.tei-c.org/Activities/Council/Working/tcw19.html
External links
- TEI Consortium Web site (hosted at University of Virginia) with a list of TEI projects, a form for adding your project and wiki
- TEI @ Oxford (hosted at Oxford University) with development and backup versions of much of the core content.
- TEI development (hosted at SourceForge.net) with bugtracker, version control, etc.
- Larger list of TEI Projects
Recent Comments