Update README.md
Browse files
README.md
CHANGED
@@ -15,29 +15,33 @@ model-index:
|
|
15 |
- type: cer
|
16 |
value: 0.002896524170994806
|
17 |
name: CER
|
|
|
|
|
|
|
|
|
|
|
18 |
---
|
19 |
|
20 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
21 |
-
should probably proofread and complete it, then remove this comment. -->
|
22 |
-
|
23 |
# trocr-base-printed-synthetic_dataset_ocr
|
24 |
|
25 |
This model is a fine-tuned version of [microsoft/trocr-base-printed](https://huggingface.co/microsoft/trocr-base-printed) on an unknown dataset.
|
26 |
|
27 |
## Model description
|
28 |
|
29 |
-
|
30 |
|
31 |
## Intended uses & limitations
|
32 |
|
33 |
-
|
34 |
|
35 |
## Training and evaluation data
|
36 |
|
37 |
-
|
38 |
|
39 |
## Training procedure
|
40 |
|
|
|
|
|
41 |
### Training hyperparameters
|
42 |
|
43 |
The following hyperparameters were used during training:
|
@@ -51,8 +55,7 @@ The following hyperparameters were used during training:
|
|
51 |
- mixed_precision_training: Native AMP
|
52 |
|
53 |
### Training results
|
54 |
-
|
55 |
-
|
56 |
|
57 |
### Framework versions
|
58 |
|
@@ -60,3 +63,13 @@ The following hyperparameters were used during training:
|
|
60 |
- Pytorch 1.13.1+cu116
|
61 |
- Datasets 2.10.1
|
62 |
- Tokenizers 0.13.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
- type: cer
|
16 |
value: 0.002896524170994806
|
17 |
name: CER
|
18 |
+
language:
|
19 |
+
- en
|
20 |
+
metrics:
|
21 |
+
- cer
|
22 |
+
pipeline_tag: image-to-text
|
23 |
---
|
24 |
|
|
|
|
|
|
|
25 |
# trocr-base-printed-synthetic_dataset_ocr
|
26 |
|
27 |
This model is a fine-tuned version of [microsoft/trocr-base-printed](https://huggingface.co/microsoft/trocr-base-printed) on an unknown dataset.
|
28 |
|
29 |
## Model description
|
30 |
|
31 |
+
View my code using the link displayed under the 'Training procedure' headling.
|
32 |
|
33 |
## Intended uses & limitations
|
34 |
|
35 |
+
This model could be used to read labels.
|
36 |
|
37 |
## Training and evaluation data
|
38 |
|
39 |
+
Here is the link to the dataset that I used for this model: https://www.kaggle.com/datasets/ravi02516/20k-synthetic-ocr-dataset
|
40 |
|
41 |
## Training procedure
|
42 |
|
43 |
+
Here is the link to my code for this model: https://github.com/DunnBC22/Computer_Vision_Projects/tree/main/Optical%20Character%20Recognition%20(OCR)/20%2C000%20Synthetic%20Samples%20Dataset
|
44 |
+
|
45 |
### Training hyperparameters
|
46 |
|
47 |
The following hyperparameters were used during training:
|
|
|
55 |
- mixed_precision_training: Native AMP
|
56 |
|
57 |
### Training results
|
58 |
+
CER = 0.003 (Actually, 0.002896524170994806)
|
|
|
59 |
|
60 |
### Framework versions
|
61 |
|
|
|
63 |
- Pytorch 1.13.1+cu116
|
64 |
- Datasets 2.10.1
|
65 |
- Tokenizers 0.13.2
|
66 |
+
|
67 |
+
*Note: Please make sure to give proper credit to the owner(s) of the data and developers of the model (microsoft/trocr-base-printed).
|
68 |
+
|
69 |
+
|
70 |
+
|
71 |
+
### Model Checkpoint
|
72 |
+
@misc{li2021trocr, title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei}, year={2021}, eprint={2109.10282}, archivePrefix={arXiv}, primaryClass={cs.CL}}
|
73 |
+
|
74 |
+
### Metric (Character Error Rate [CER])
|
75 |
+
@inproceedings{morris2004, author = {Morris, Andrew and Maier, Viktoria and Green, Phil}, year = {2004}, month = {01}, pages = {}, title = {From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.} }
|