DunnBC22's picture
Update README.md
  - generated_from_trainer
  - name: trocr-base-printed-synthetic_dataset_ocr
      - task:
          type: image-to-text
          name: Text Generation
          name: synthetic_dataset_ocr
          type: synthetic_dataset_ocr
          split: test
          - type: cer
            value: 0.002896524170994806
            name: CER
  - en
  - cer
pipeline_tag: image-to-text


This model is a fine-tuned version of microsoft/trocr-base-printed on an unknown dataset.

Model description

Here is the link to my code for this model: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Optical%20Character%20Recognition%20(OCR)/20%2C000%20Synthetic%20Samples%20Dataset

Intended uses & limitations

This model could be used to read labels with printed text.

Training and evaluation data

Here is the link to the dataset that I used for this model: https://www.kaggle.com/datasets/ravi02516/20k-synthetic-ocr-dataset

Character Length for Training Dataset:

Input Character Length for Training Dataset

Character Length for Evaluation Dataset:

Input Character Length for Evaluation Dataset

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Training results

CER = 0.003 (Actually, 0.002896524170994806)

Framework versions

  • Transformers 4.26.1
  • Pytorch 1.13.1+cu116
  • Datasets 2.10.1
  • Tokenizers 0.13.2

*Note: Please make sure to give proper credit to the owner(s) of the data and developers of the model (microsoft/trocr-base-printed).

Model Checkpoint

@misc{li2021trocr, title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei}, year={2021}, eprint={2109.10282}, archivePrefix={arXiv}, primaryClass={cs.CL}}

Metric (Character Error Rate [CER])

@inproceedings{morris2004, author = {Morris, Andrew and Maier, Viktoria and Green, Phil}, year = {2004}, month = {01}, pages = {}, title = {From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.} }