why chinese image ocr error ocde

by Viking714 - opened

Hello, I recently use this model to do Chinese image OCR, but I got the wrong words output, the code I use is below:

from PIL import Image
img_pil = Image.open('/kaggle/input/timuimage/timu.jpg')
image = img_pil.convert("RGB")

from transformers import LayoutXLMProcessor
processor = LayoutXLMProcessor.from_pretrained("Microsoft/layoutlmv3-base-chinese")
feature_extractor = processor.feature_extractor

preprocess image to text

encoded_inputs = feature_extractor(image)
words = encoded_inputs.words

Just output the words in a format

text = ""
for word in words[0]:
text = text + word

The output is as below:

The Image I use is from https://www.kaggle.com/datasets/viking714/timuimage, everyone can see the image, it's public.
I use the same method to OCR English images to words by LayoutXLM and LayoutLMV2 models, they are both ok.
Thank you very much.


image_processor = LayoutLMv3ImageProcessor.from_pretrained(model_name,ocr_lang='chi_sim+eng')
tokenizer = XLMRobertaTokenizer.from_pretrained(model_name)
processor = LayoutLMv3Processor(image_processor=image_processor,tokenizer=tokenizer,apply_ocr=True)

Hello, I was trying to use it in the same way. But I got this error:

ValueError Traceback (most recent call last)
in <cell line: 4>()
2 image_processor = LayoutLMv3ImageProcessor.from_pretrained(model_name,ocr_lang='chi_sim+eng')
3 tokenizer = XLMRobertaTokenizer.from_pretrained(model_name)
----> 4 processor = LayoutLMv3Processor(image_processor=image_processor,tokenizer=tokenizer,apply_ocr=True)

ValueError: Received XLMRobertaTokenizer for argument tokenizer, but a ('LayoutLMv3Tokenizer', 'LayoutLMv3TokenizerFast') was expected.

What can be wrong? Thanks

tokenizer_class = ("LayoutLMv3Tokenizer", "LayoutLMv3TokenizerFast")
tokenizer_class = ("LayoutLMv3Tokenizer", "LayoutLMv3TokenizerFast",'XLMRobertaTokenizer','XLMRobertaTokenizerFast','LayoutXLMTokenizer')



from transformers import XLMRobertaTokenizer, AutoModel, AutoProcessor, LayoutLMv3ImageProcessor, LayoutLMv3Processor
model_name = "Microsoft/layoutlmv3-base-chinese"
image_processor = LayoutLMv3ImageProcessor.from_pretrained(model_name, ocr_lang='chi_sim+eng')
tokenizer = XLMRobertaTokenizer.from_pretrained(model_name)
processor = LayoutLMv3Processor(image_processor=image_processor,tokenizer=tokenizer,apply_ocr=True)
feature_extractor = processor.feature_extractor
inputs = feature_extractor(image)

Sign up or log in to comment