tei-publisher-lib 4: Refactoring TEI output mode

Version 4.0.0 refactors the TEI output mode, making it more widely usable for transformation scenarios producing TEI

tei-publisher-lib contains the code libraries used for ODD processing and the different supported output modes. It implements the TEI Processing Model as defined by the TEI guidelines.

We use semantic versioning for all TEI Publisher projects: a change in the first digit of the version number indicates a breaking change. You should be aware that such releases are not fully backwards compatible and may require adjustments to existing apps.

Warning: custom applications generated by TEI Publisher may update `tei-publisher-lib` automatically when (re-)installed into eXist. See how to prevent this.

The TEI Processing Model is not only useful for transforming TEI into some other XML output format, but it can also produce TEI from something else. For example, the MS Word import in TEI Publisher converts files in the docx format to TEI with the help of an ODD and the tei output mode.

Recently we encountered another case which could be elegantly solved with an ODD: the task was to clean up TEI produced by a form-based editor, eliminating superfluous attributes and element hierarchies. So the input as well as the output of the transformation were TEI. Unfortunately the implementation of the tei output mode included some solutions which were specific to the processing of docx documents – and consequently those were unnecessarily applied. We therefore had to refactor the library, cleanly separating the docx-specific processing from the general tei output mode.

We also added a new behaviour, copy, to the mode, which will copy the current element's start/end tag and attributes before recursively processing its contents. This is useful, as the default behaviour of the TEI PM is to skip any tags it does not know how to handle.

Other changes

Another new feature which was already released in 3.1.0 – but not documented: you can now access variables and functions defined in the global configuration XQuery module (config.xqm) within your ODD. Configuration variables and functions are exposed to XPath expressions in the ODD under the prefix global. So to access the current data root collection, you can simply write $global:data-root to refer to the corresponding variable defined in config.xqm.

Upgrading

As indicated by the incremented major version number (4.0.0), this release is not backwards compatible. Importing Word docx documents will stop working! Other functionality should not be affected though.

To fix this issue after upgrading a custom application you built, you need to add an extra import to resources/odd/configuration.xml:

<modules>
...
<output mode="tei" odd="docx">
<module uri="http://www.tei-c.org/tei-simple/xquery/functions/docx" prefix="ext-docx"/>
</output>
</modules>

The upcoming version 8.1 of TEI Publisher will already include the required changes.