--- language: - de library_name: transformers datasets: - fhswf/german_handwriting license: afl-3.0 pipeline_tag: image-to-text --- # Model Card for TrOCR_german_handwritten ## Model Details TrOCR model fine-tuned on the [german_handwriting](https://huggingface.co/datasets/fhswf/german_handwriting). It was introduced in the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Li et al. and first released in [this repository](https://github.com/microsoft/unilm/tree/master/trocr). - **Developed by:** [More Information Needed] - **Model type:** Transformer OCR - **Language(s) (NLP):** German - **License:** afl-3.0 - **Finetuned from model [optional]:** [TrOCR_large_handwritten](https://huggingface.co/microsoft/trocr-large-handwritten) ## Uses Here is how to use this model in PyTorch: ```python from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image import requests # load image from the IAM database url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg' image = Image.open(requests.get(url, stream=True).raw).convert("RGB") processor = TrOCRProcessor.from_pretrained('fhswf/TrOCR_german_handwritten') model = VisionEncoderDecoderModel.from_pretrained('fhswf/TrOCR_german_handwritten') pixel_values = processor(images=image, return_tensors="pt").pixel_values generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## Bias, Risks, and Limitations You can use the raw model for optical character recognition (OCR) on single text-line images of german handwriting. ## Training Details ### Training Data This model was finetuned on [german_handwriting](https://huggingface.co/datasets/fhswf/german_handwriting). ## Evaluation Levenshtein: 1.85
WER (Word Error Rate): 17.5%
CER (Character Error Rate): 4.1% **BibTeX:** ```bibtex @misc{li2021trocr, title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei}, year={2021}, eprint={2109.10282}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```