File size: 2,997 Bytes
da1815e
 
 
 
 
61dea36
 
 
 
 
 
 
 
 
 
 
 
2131600
 
 
 
 
da1815e
 
 
 
 
 
 
 
970f0d0
da1815e
 
 
1c0544c
da1815e
 
 
2131600
da1815e
8ce3b0b
 
ca930f6
8ce3b0b
 
 
 
 
da1815e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2131600
da1815e
 
 
 
 
 
 
2131600
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
tags:
- generated_from_trainer
model-index:
- name: trocr-base-printed-synthetic_dataset_ocr
  results:
  - task:
      type: image-to-text
      name: Text Generation
    dataset:
      name: synthetic_dataset_ocr
      type: synthetic_dataset_ocr
      split: test
    metrics:
    - type: cer
      value: 0.002896524170994806
      name: CER
language:
- en
metrics:
- cer
pipeline_tag: image-to-text
---

# trocr-base-printed-synthetic_dataset_ocr

This model is a fine-tuned version of [microsoft/trocr-base-printed](https://huggingface.co/microsoft/trocr-base-printed) on an unknown dataset.

## Model description

Here is the link to my code for this model: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Optical%20Character%20Recognition%20(OCR)/20%2C000%20Synthetic%20Samples%20Dataset

## Intended uses & limitations

This model could be used to read labels with printed text.

## Training and evaluation data

Here is the link to the dataset that I used for this model: https://www.kaggle.com/datasets/ravi02516/20k-synthetic-ocr-dataset

_Character Length for Training Dataset:_

![Input Character Length for Training Dataset](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/raw/main/Optical%20Character%20Recognition%20(OCR)/20%2C000%20Synthetic%20Samples%20Dataset/Images/Input%20Characgter%20Length%20Distribution%20for%20Training%20Dataset.png)

_Character Length for Evaluation Dataset:_

![Input Character Length for Evaluation Dataset](https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/raw/main/Optical%20Character%20Recognition%20(OCR)/20%2C000%20Synthetic%20Samples%20Dataset/Images/Input%20Characgter%20Length%20Distribution%20for%20Evaluation%20Dataset.png)

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
- mixed_precision_training: Native AMP

### Training results
CER = 0.003 (Actually, 0.002896524170994806)

### Framework versions

- Transformers 4.26.1
- Pytorch 1.13.1+cu116
- Datasets 2.10.1
- Tokenizers 0.13.2

*Note: Please make sure to give proper credit to the owner(s) of the data and developers of the model (microsoft/trocr-base-printed).

### Model Checkpoint
@misc{li2021trocr, title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei}, year={2021}, eprint={2109.10282}, archivePrefix={arXiv}, primaryClass={cs.CL}}

### Metric (Character Error Rate [CER])
@inproceedings{morris2004, author = {Morris, Andrew and Maier, Viktoria and Green, Phil}, year = {2004}, month = {01}, pages = {}, title = {From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.} }