--- license: apache-2.0 language: - ko metrics: - cer - wer pipeline_tag: image-to-text --- # trOCR-final fine-tuned for VisionEncoderDecoderModel(encoder , decoder) encoder = 'facebook/deit-base-distilled-patch16-384' decoder = 'klue/roberta-base' ## How to Get Started with the Model ```python from transformers import VisionEncoderDecoderModel,AutoTokenizer, TrOCRProcessor import torch from PIL import Image device = torch.device('cuda') # change 'cuda' if you need. image_path='(your image path)' image = Image.open(image_path) #model can be .jpg or .png #hugging face download: https://huggingface.co/gg4ever/trOCR-final processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten") trocr_model = "gg4ever/trOCR-final" model = VisionEncoderDecoderModel.from_pretrained(trocr_model).to(device) tokenizer = AutoTokenizer.from_pretrained(trocr_model) pixel_values = (processor(image, return_tensors="pt").pixel_values).to(device) generated_ids = model.generate(pixel_values) generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(generated_text) ``` ## Training Details ### Training Data 1M words generated by TextRecognitionDataGenerator(trdg) : https://github.com/Belval/TextRecognitionDataGenerator/blob/master/trdg/run.py 1.1M words from AI-hub OCR words dataset : https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=81 ### Training Hyperparameters |hyperparameters|values| |-----------------------------|-------| |predict_with_generate|True| |evaluation_strategy|"steps"| |per_device_train_batch_size|32| |per_device_eval_batch_size|32| |num_train_epochs|2| |fp16|True| |learning_rate|4e-5| |eval_stept|10000| |warmup_steps|20000| |weight_decay|0.01|