Textrecognition Model for Essoins (England) in Latin

Part of the developments within the Flow-Project. Developed by Jonas Widmer, Christopher Kuhlmann, and Melvin Wilde.

Base model: magistermilitum/tridis_HTR v1

Train Lines: 15356

Eval Lines: 394

Test Lines: 2995

Epochs: 14.1667 / 20

Eval CER: 0.0544

Test CER: 0.0622

Testresults with CERberus

Metric	Value
Character Error Rate	6.22
Number of Correct Characters	186998
Number of Substitutions	5425
Number of Insertions	2933
Number of Deletions	3849
Total Character Count	196272
Original Lines Count	2288
Discarded Lines Count	0

Block	Count	Correct	Incorrect	Correct Ratio	Incorrect Ratio
Digits	0	0	0	nan	nan
Lowercase Latin alphabet	154731	147241	7490	95.16	4.84
MUFI Glyphs	0	0	0	nan	nan
Punctuation	9	4	5	44.44	55.56
Uppercase Latin alphabet	6883	6450	433	93.71	6.29

The handwritten texts in Latin (with some Middle-English and Anglo-Norman wording) that were used for training are from the 13th and 14th centuries. They come from England and were written in 'Court Hand', also known as 'Anglicana'. They come from the 'Court of Common Pleas', the second highest court of the time, and deal primarily with civil disputes, such as inheritances or dowries, and from the Justices, which also dealt with civil pleas, but covered crown pleas as well.

The model has not been extensively tested.

Errors often occur in the Punctuation, which itself has an error rate of 44.44% which mostly consits of missed ‧ dots.

Potential biases are still to be identified.

dh-unibe
/

trocr-essoins-middle-latin

Textrecognition Model for Essoins (England) in Latin

Model tree for dh-unibe/trocr-essoins-middle-latin