magistermilitum commited on
Commit
5d4bc62
·
verified ·
1 Parent(s): b13c6ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -3
README.md CHANGED
@@ -68,7 +68,58 @@ A CRNN+CTC version of this model trained on Kraken 4.0 (https://github.com/mitta
68
  Torres Aguilar, S. (2024). TRIDIS v2 : HTR model for Multilingual Medieval and Early Modern Documentary Manuscripts (11th-16th) (Version 2). Zenodo. https://doi.org/10.5281/zenodo.13862096
69
 
70
 
71
- The following snippet can be used to get model inferences on manuscript lines. Ideally the test dataset must be passed to the model on the form of a json file redirecting to the images:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  for ex (graphical_line_path, line_text_content):
74
 
@@ -78,8 +129,6 @@ for ex (graphical_line_path, line_text_content):
78
  etc.
79
  ]
80
 
81
- Clone the model using: git lfs clone https://huggingface.co/magistermilitum/tridis_v2_HTR_historical_manuscripts
82
-
83
  ```python
84
  import glob
85
  import json, random
 
68
  Torres Aguilar, S. (2024). TRIDIS v2 : HTR model for Multilingual Medieval and Early Modern Documentary Manuscripts (11th-16th) (Version 2). Zenodo. https://doi.org/10.5281/zenodo.13862096
69
 
70
 
71
+ The following snippets can be used to get model inferences on manuscript lines.
72
+
73
+ 1. Clone the model using: git lfs clone https://huggingface.co/magistermilitum/tridis_v2_HTR_historical_manuscripts
74
+
75
+ 2. Here is how to test the model on one single image:
76
+
77
+ ```python
78
+ from transformers import TrOCRProcessor, AutoTokenizer, VisionEncoderDecoderModel
79
+ from safetensors.torch import load_file
80
+ import torch.nn as nn
81
+
82
+ from PIL import Image
83
+
84
+ # load image from the IAM database
85
+ path="/path/to/image/file.png"
86
+ image = Image.open(path).convert("RGB")
87
+
88
+ processor = TrOCRProcessor.from_pretrained("./tridis_v2_HTR_historical_manuscripts")
89
+ model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-large-handwritten')
90
+
91
+ # Load the weights of this model
92
+ safetensors_path = "./tridis_v2_HTR_historical_manuscripts/model.safetensors" #load the weights from the downloaded model
93
+ state_dict = load_file(safetensors_path)
94
+
95
+ # Load the trocr model
96
+ model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-large-handwritten")
97
+
98
+ #Modify the embeddings size and vocab
99
+ model.config.decoder.vocab_size = processor.tokenizer.vocab_size
100
+ model.config.vocab_size = model.config.decoder.vocab_size
101
+ model.decoder.output_projection = nn.Linear(1024, processor.tokenizer.vocab_size)
102
+ #model.decoder.model.decoder.embed_tokens = nn.Embedding(processor.tokenizer.vocab_size, 1024, padding_idx=1)
103
+ model.decoder.embed_tokens = nn.Embedding(processor.tokenizer.vocab_size, 1024, padding_idx=1)
104
+
105
+ # set beam search parameters
106
+ model.config.eos_token_id = processor.tokenizer.sep_token_id
107
+ model.config.max_length = 160
108
+ model.config.early_stopping = True
109
+ model.config.no_repeat_ngram_size = 3
110
+ model.config.length_penalty = 2.0
111
+ model.config.num_beams = 3
112
+
113
+ model.load_state_dict(state_dict)
114
+
115
+ pixel_values = processor(images=image, return_tensors="pt").pixel_values
116
+
117
+ generated_ids = model.generate(pixel_values)
118
+ generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
119
+ print(generated_text)
120
+ ```
121
+
122
+ 3. Here is how test the model on a dataset. Ideally the test dataset must be passed to the model on the form of a json list redirecting to the images:
123
 
124
  for ex (graphical_line_path, line_text_content):
125
 
 
129
  etc.
130
  ]
131
 
 
 
132
  ```python
133
  import glob
134
  import json, random