Different PreTraining model and processor

#2
by CristianJD - opened

What happend if a use a the base model as the processor and the small for the model?, is a bad practices?. I'm still learning how to finetuning TrOCR models

Qantev org
This comment has been hidden
Qantev org

What happend if a use a the base model as the processor and the small for the model?, is a bad practices?. I'm still learning how to finetuning TrOCR models

In this case, it wouldn't change anything because we didn't change the processor. But in some cases, if we change the tokenizer for instance, it wouldn't work. The best practice is always to use the processor and the model from the same checkpoint.

What happend if a use a the base model as the processor and the small for the model?, is a bad practices?. I'm still learning how to finetuning TrOCR models

In this case, it wouldn't change anything because we didn't change the processor. But in some cases, if we change the tokenizer for instance, it wouldn't work. The best practice is always to use the processor and the model from the same checkpoint.

Thanks

Qantev org

Just to be sure, the processor gathers the tokenizer of the decoder and the pre-processing of the encoder. So the processor is specific to the encoder and the decoder. Since the small TrOCR uses BeiT+MiniLM as an encoder-decoder and the base version a BeitT BASE + RoBERTa LARGE, the processors used are different. So don't use the processor of the base version for the small model.

Just to be sure, the processor gathers the tokenizer of the decoder and the pre-processing of the encoder. So the processor is specific to the encoder and the decoder. Since the small TrOCR uses BeiT+MiniLM as an encoder-decoder and the base version a BeitT BASE + RoBERTa LARGE, the processors used are different. So don't use the processor of the base version for the small model.

Thanks, i had already fine tuning the small-spanish model with my own dataset (mostly handwriting and a few with printed), with cuts of name of cities, emails , etc. The trocr-small-stage1 went better in a general view that the small-spanish, i only can afford run the small models (less than 70M of parameters), any idea of how to improve the results?, in general the accuracy its 60%

Qantev org

During our experiments we noticed that the size of the dataset is by far the biggest source of improvements. I would recommend using more data augmentation methods, especially the elastic deformation
https://pytorch.org/vision/main/generated/torchvision.transforms.ElasticTransform.html

I'll try it, thanks so much :D

I'll try it, thanks so much :D

i use YOLO-NAS fine tuning to detect the text in multiple lines and then i use it to preprocess the image and give to the TrOCR the result, i can use the pretraining weights of qantev/trocr-large-spanish model under what licence ?

Sign up or log in to comment