TrOCR Kurrent-Model 16th to 18th century

Base model: dh-unibe/trocr-kurrent

Epochs: 19.85 / 20
Eval CER: 0.05673
Test CER: 0.05416

This model is based on an extensive training set (of roughly 1579200 words) and evaluated against the same hands in an evaluation and test set (automatic split). Consisting of German Kurrent scripts written in the 16th-18th century.

The ground truth stems from different projects and partners and is biased toward Swiss documents. It is based on documents from a variety of archives and projects. Among others, the State Archives of Zürich (Stillstandsprotokolle, Ratsmanuale, Findmittel), and the scholarly edition project Königsfelden (Universitäten Zürich und Bern: www.koenigsfelden.uzh.ch). As well as transcriptions from Einsiedeln. Further contributions by the university archives of Greifswald: https://rechtsprechung-im-ostseeraum.archiv.uni-greifswald.de/.

The public Transkribus model (based on PyLaia) can be found here: https://readcoop.eu/model/german-kurrent-16th-18th/

Extensive testing of the model has still to be carried out. This is only a first attempt but might help for fine-tuning tasks.

Downloads last month
37
Safetensors
Model size
334M params
Tensor type
F32
·
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.