Releases: jjmccollum/teiphy
Support through Python 3.12, 62-state support for NEXUS outputs, and support for PHYLIP distance/similarity matrices
This release incorporates contributions from @catsmith to ensure compatibility with Python versions 3.9 through 3.12. (As a result of these changes, Python 3.8 is no longer supported.) To accommodate software like PAUP* and @edmondac's fork of MrBayes (https://github.com/edmondac/MrBayes), the symbol set for NEXUS outputs has been extended to 62 symbols (0-9, a-z, and A-Z). This release also adds support for the use of --table distance
and --table similarity
options (along with the --proportion
and --show-ext
flags) with outputs in PHYLIP (.phy and .ph) format to produce PHYLIP-formatted distance or similarity matrices.
Support for similarity matrices and common variation unit counts in distance/similarity matrices
This release introduces the --table similarity
option, which produces a tabular output with counts of pairwise agreements between witnesses (or, if the --proportion
flag is specified, proportions of agreements among variation units where both witnesses have non-ambiguous readings), as well as the --show-ext
flag, which adds the number of variation units where both witnesses have non-ambiguous readings to each cell's value (e.g., 47/50 or 0.94/50). This option can also be used with distance matrices specified with --table distance
.
Support for exclusion of fragmentary witnesses
With this release, you can exclude fragmentary witnesses from your collation by specifying the --fragmentary-threshold
command-line option, followed by a number between 0 and 1 indicating the proportion of variation units at which a witness must be extant (i.e., have a non-missing reading according to the reading type(s) specified with the -m
option) to be included in the output. Thus, --fragmentary-threshold 0.7
will exclude all witnesses with more than 30 percent of their readings missing, while --fragmentary-threshold 1.0
will exclude witnesses with any missing readings. (Note that this check is performed after correctors' hands have been filled in, if you have supplied the --fill-correctors
option.)
Extended number of states for BEAST 2 outputs
In principle, any number of states should theoretically be permissible in BEAST 2.7 XML inputs, since the states are specified as sequences of probabilities rather than with one-character symbols. But even with sequences encoded in this way, BEAST 2 still requires code maps (for some reason), so we are limited by the space of allowable single-character symbols. Previously, teiphy restricted the set of BEAST state symbols to 0-9 and a-z. This release adds A-Z to the symbol set.
Support for variation unit identification through combination of "n", "from", and "to" attributes
Previously, teiphy assumed that each variation unit (i.e., an app
element) would be uniquely identified by its xml:id
attribute or its n
attribute alone. While this assumption holds in the case of xml:id
attributes (which, by definition, must be unique), it does not hold for n
attributes. In practice, TEI XML collations assign app
elements in the same larger passage of text (e.g., a verse) the same n
value as that larger passage and then assign the app
elements additional from
and to
attributes specifying word indices, so as to specify the unique location of the variation unit within that larger passage. To this end, the VariationUnit
class of teiphy now checks for from
and to
attributes in addition to an n
attribute and combines them to form a unique ID for the variation unit.
Support for supplying/updating witness date ranges through external CSV file
This release provides a new feature for the convenience of users who have derived their collation data and witness date ranges from different sources: a CSV file containing witness IDs and (potentially empty) minimum and maximum dates can be specified with the --dates-file
command-line option. For witnesses in the CSV file, the specified date range will overwrite any existing date range in the TEI XML collation.
Update to mirror current version of STEMMA; dependency updates
This release increases the number of states for STEMMA outputs from 22 to 62 in accordance with the latest updates to STEMMA. It also updates several dependencies to address vulnerabilities noted by Dependabot.
Minor fix for STEMMA outputs and updates to dependencies
This release corrects the previous release's fix for STEMMA outputs (so that they support 22 states rather than 24) and updates several dependencies to address vulnerabilities noted by Dependabot.
Fixes for BEAST loggers and STEMMA state encodings
This minor release adds some missing attributes to state/ancestral logger elements in BEAST outputs to ensure that root frequencies (corresponding to intrinsic probability judgments) are incorporated into probability calculations for state sampling. It also fixes a previous bug in mapping variant reading indices to state codes in STEMMA outputs, so that reading indices (up to a maximum of 24 per variation unit) are now mapped to single-character state codes.
Support for time-dependent transcriptional relations in BEAST 2.7 outputs
The main change introduced in this release is support for tagging of potential transcriptional explanations with notBefore and notAfter attributes. If these attributes are present in a variation unit's transcriptional relations list, teiphy will now map the transcriptional relations to an EpochSubstitutionModel with a different substitution model for different slices of time. This feature is only supported for BEAST 2.7 XML outputs. This means that BEAST users can now model time-dependent transcriptional changes (like assimilation to later popular texts, paleographic confusions possible only for earlier or later scripts, etc.) more accurately.
A related change is the addition of more comprehensive rules for updating witness date ranges based on the date range of the work's composition (and vice-versa). This change affects age/date calibrations for NEXUS and BEAST 2.7 XML formats (including the MrBayes NEXUS input format).
This release also fixes an error that prevented the --verbose flag from working correctly.