Identification of the core components of textual comparison at an abstract level
Prerequisite: an electronic text version of each witness
Would you care for a sherbet lemon?
--> Would | you | care | for | a | sherbet | lemon | ?
Alignment of tree witnesses
→
W1 | Introduction à | la collation automatique | |
---|---|---|---|
W2 | Cours | sur | la collation automatique |
W3 | En savoir plus | sur | la collation automatique |
Peter | ' | s | cat | . |
By default, it removes trailing white space at the end of tokens.
JSON file as input: Each token may present a normalized version
Exact vs. near (fuzzy) matching
A | And | Ron | pulled | out | a | fat | grey | rat |
B | And | Ronald | pulled | out | a | gray | - | rat |
A | And | Ron | pulled | out | a | fat | grey | rat |
B | And | Ronald | pulled | out | a | - | gray | rat |
W1 | Introduction à | la collation automatique | |
---|---|---|---|
W2 | Cours | sur | la collation automatique |
W3 | En savoir plus | sur | la collation automatique |
<cx:apparatus xmlns:cx="http://interedition.eu/collatex/ns/1.0"
xmlns="http://www.tei-c.org/ns/1.0">
<app>
<rdg wit="W1">Introduction à</rdg>
<rdg wit="W2">Cours</rdg>
<rdg wit="W3">En savoir plus</rdg></app>
<app><rdg wit="W1"/><rdg wit="W2 W3">sur</rdg></app>
la collation automatique</cx:apparatus>