Base model: magistermilitum/tridis_HTR v1

Train Lines: 15356

Eval Lines: 394

Test Lines: 2995

Epochs: 14.1667 / 20

Eval CER: 0.0544

Test CER: 0.0622

Testresults with CERberus

Metric Value
Character Error Rate 6.22
Number of Correct Characters 186998
Number of Substitutions 5425
Number of Insertions 2933
Number of Deletions 3849
Total Character Count 196272
Original Lines Count 2288
Discarded Lines Count 0
Block Count Correct Incorrect Correct Ratio Incorrect Ratio
Digits 0 0 0 nan nan
Lowercase Latin alphabet 154731 147241 7490 95.16 4.84
MUFI Glyphs 0 0 0 nan nan
Punctuation 9 4 5 44.44 55.56
Uppercase Latin alphabet 6883 6450 433 93.71 6.29

The handwritten texts in Latin (with some Middle-English and Anglo-Norman wording) that were used for training are from the 13th and 14th centuries. They come from England and were written in 'Court Hand', also known as 'Anglicana'. They come from the 'Court of Common Pleas', the second highest court of the time, and deal primarily with civil disputes, such as inheritances or dowries, and from the Justices, which also dealt with civil pleas, but covered crown pleas as well.

The model has not been extensively tested.

Errors often occur in the Punctuation, which itself has an error rate of 44.44% which mostly consits of missed ‧ dots.

Potential biases are still to be identified.

Downloads last month
39
Safetensors
Model size
558M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for dh-unibe/trocr-essoins-middle-latin

Finetuned
(1)
this model