metadata
language:
- en
- zh
tags:
- Image-to-Text
- OCR
- Image-Captioning
datasets:
- priyank-m/text_recognition_en_zh_clean
metrics:
- cer
Multilingual OCR (mOCR) is a VisionEncoderDecoder model based on the concept of TrOCR for English and Chinese document text-recognition. It uses a pre-trained Vision encoder and a pre-trained Language model as decoder.
Encoder model used: facebook/vit-mae-large
Decoder model used: xlm-roberta-base