thai_trocr_thaigov_v2

Vision Encoder Decoder Models

  • Use microsoft/trocr-base-handwritten as encoder.
  • Use airesearch/wangchanberta-base-att-spm-uncased as decoder
  • Fine-tune on 250k synthetic text images dataset using ThaiGov V2 Corpus
  • Use SynthTIGER to generate synthetic text image.
  • It is useful to fine-tune any Thai OCR task.

Usage

from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

processor = TrOCRProcessor.from_pretrained("kkatiz/thai-trocr-thaigov-v2")
model = VisionEncoderDecoderModel.from_pretrained("kkatiz/thai-trocr-thaigov-v2")

image = Image.open("... your image path").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)

generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
Downloads last month
1,289
Safetensors
Model size
220M params
Tensor type
F32
·
Inference Examples
Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using kkatiz/thai-trocr-thaigov-v2 1