ardagast commited on
Commit
67ec2cb
·
verified ·
1 Parent(s): 73be8cd

End of training

Browse files
Files changed (3) hide show
  1. README.md +52 -62
  2. generation_config.json +9 -9
  3. model.safetensors +1 -1
README.md CHANGED
@@ -1,62 +1,52 @@
1
- ---
2
- tags:
3
- - trocr
4
- - image-to-text
5
- widget:
6
- - src: https://layoutlm.blob.core.windows.net/trocr/dataset/SROIE2019Task2Crop/train/X00016469612_1.jpg
7
- example_title: Printed 1
8
- - src: https://layoutlm.blob.core.windows.net/trocr/dataset/SROIE2019Task2Crop/train/X51005255805_7.jpg
9
- example_title: Printed 2
10
- - src: https://layoutlm.blob.core.windows.net/trocr/dataset/SROIE2019Task2Crop/train/X51005745214_6.jpg
11
- example_title: Printed 3
12
- ---
13
-
14
- # TrOCR (base-sized model, fine-tuned on SROIE)
15
-
16
- TrOCR model fine-tuned on the [SROIE dataset](https://rrc.cvc.uab.es/?ch=13). It was introduced in the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Li et al. and first released in [this repository](https://github.com/microsoft/unilm/tree/master/trocr).
17
-
18
- Disclaimer: The team releasing TrOCR did not write a model card for this model so this model card has been written by the Hugging Face team.
19
-
20
- ## Model description
21
-
22
- The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image encoder was initialized from the weights of BEiT, while the text decoder was initialized from the weights of RoBERTa.
23
-
24
- Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Next, the Transformer text decoder autoregressively generates tokens.
25
-
26
- ## Intended uses & limitations
27
-
28
- You can use the raw model for optical character recognition (OCR) on single text-line images. See the [model hub](https://huggingface.co/models?search=microsoft/trocr) to look for fine-tuned versions on a task that interests you.
29
-
30
- ### How to use
31
-
32
- Here is how to use this model in PyTorch:
33
-
34
- ```python
35
- from transformers import TrOCRProcessor, VisionEncoderDecoderModel
36
- from PIL import Image
37
- import requests
38
-
39
- # load image from the IAM database (actually this model is meant to be used on printed text)
40
- url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg'
41
- image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
42
-
43
- processor = TrOCRProcessor.from_pretrained('microsoft/trocr-base-printed')
44
- model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-base-printed')
45
- pixel_values = processor(images=image, return_tensors="pt").pixel_values
46
-
47
- generated_ids = model.generate(pixel_values)
48
- generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
49
- ```
50
-
51
- ### BibTeX entry and citation info
52
-
53
- ```bibtex
54
- @misc{li2021trocr,
55
- title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models},
56
- author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
57
- year={2021},
58
- eprint={2109.10282},
59
- archivePrefix={arXiv},
60
- primaryClass={cs.CL}
61
- }
62
- ```
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: ardagast/trocr-cyrillic
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: trocr-cyrillic
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # trocr-cyrillic
15
+
16
+ This model is a fine-tuned version of [ardagast/trocr-cyrillic](https://huggingface.co/ardagast/trocr-cyrillic) on the None dataset.
17
+
18
+ ## Model description
19
+
20
+ More information needed
21
+
22
+ ## Intended uses & limitations
23
+
24
+ More information needed
25
+
26
+ ## Training and evaluation data
27
+
28
+ More information needed
29
+
30
+ ## Training procedure
31
+
32
+ ### Training hyperparameters
33
+
34
+ The following hyperparameters were used during training:
35
+ - learning_rate: 5e-05
36
+ - train_batch_size: 16
37
+ - eval_batch_size: 16
38
+ - seed: 42
39
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
40
+ - lr_scheduler_type: linear
41
+ - num_epochs: 5
42
+
43
+ ### Training results
44
+
45
+
46
+
47
+ ### Framework versions
48
+
49
+ - Transformers 4.51.1
50
+ - Pytorch 2.6.0+cu124
51
+ - Datasets 3.5.0
52
+ - Tokenizers 0.21.1
 
 
 
 
 
 
 
 
 
 
generation_config.json CHANGED
@@ -1,9 +1,9 @@
1
- {
2
- "_from_model_config": true,
3
- "bos_token_id": 0,
4
- "decoder_start_token_id": 2,
5
- "eos_token_id": 2,
6
- "pad_token_id": 1,
7
- "transformers_version": "4.27.0.dev0",
8
- "use_cache": false
9
- }
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "decoder_start_token_id": 2,
5
+ "eos_token_id": 2,
6
+ "pad_token_id": 1,
7
+ "transformers_version": "4.51.1",
8
+ "use_cache": false
9
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:53a39f2dde99b34bd9dd23aa964dd580e1a4def3586f780a1bcd29a1b63921ca
3
  size 1335747032
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:316c763d40d8592878d73f66f5654bdcd9a056537b6d9b0376b5a2540d6c5fb2
3
  size 1335747032