--- library_name: transformers pipeline_tag: image-to-text license: afl-3.0 --- # Model Card for TrOCR_Math_handwritten ## Model Details TrOCR model fine-tuned on a part of the [mathwriting](https://github.com/google-research/google-research/tree/master/mathwriting) dataset converted from InkML files into images. It was introduced in the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Li et al. and first released in [this repository](https://github.com/microsoft/unilm/tree/master/trocr). - **Developed by:** [More Information Needed] - **Model type:** Transformer OCR - **License:** afl-3.0 - **Finetuned from model [optional]:** [TrOCR_large_stage1](https://huggingface.co/microsoft/trocr-large-stage1) ## Uses Here is how to use this model in PyTorch: ```python from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image import requests url = "path/to/image" image = Image.open(requests.get(url, stream=True).raw).convert("RGB") processor = TrOCRProcessor.from_pretrained('fhswf/TrOCR_Math_handwritten') model = VisionEncoderDecoderModel.from_pretrained('fhswf/TrOCR_Math_handwritten') pixel_values = processor(images=image, return_tensors="pt").pixel_values generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## Bias, Risks, and Limitations You can use the raw model for optical character recognition (OCR) on images containing one mathematical formula. ## Training Details ### Training Data This model was finetuned on a part of the [mathwriting](https://github.com/google-research/google-research/tree/master/mathwriting) dataset converted from InkML files into images. ## Evaluation Percentage of correct recognition: 77.8%
Percentage of correct recognition with one error: 85.7%
Percentage of correct recognition with two error: 89.9% **BibTeX:** ```bibtex @misc{li2021trocr, title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei}, year={2021}, eprint={2109.10282}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```