How to use nielsr/lilt-xlm-roberta-base for inference with a document image?
#2
by
pierreguillou
- opened
Hi @nielsr .
Many thanks for this model that you finetuned in your notebook Fine_tune_LiLT_on_a_custom_dataset%2C_in_any_language.ipynb.
However, you did not provide the inference code to use on a document image (ie, without boxes coordinates).
I did try to adapt the code of @philschmid from his blog post Document AI: LiLT a better language agnostic LayoutLM model but it does not work.
Here is this code:
from transformers import LayoutLMv3FeatureExtractor, AutoTokenizer, LayoutLMv3Processor
model_id="nielsr/lilt-xlm-roberta-base"
# use LayoutLMv3 processor without ocr since the dataset already includes the ocr text
feature_extractor = LayoutLMv3FeatureExtractor(apply_ocr=True) # set
tokenizer = AutoTokenizer.from_pretrained(model_id)
# cannot use from_pretrained since the processor is not saved in the base model
processor = LayoutLMv3Processor(feature_extractor, tokenizer)
... and the error message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-caabe3c64173> in <module>
7 tokenizer = AutoTokenizer.from_pretrained(model_id)
8 # cannot use from_pretrained since the processor is not saved in the base model
----> 9 processor = LayoutLMv3Processor(feature_extractor, tokenizer)
/usr/local/lib/python3.8/dist-packages/transformers/processing_utils.py in __init__(self, *args, **kwargs)
82
83 if not isinstance(arg, proper_class):
---> 84 raise ValueError(
85 f"Received a {type(arg).__name__} for argument {attribute_name}, but a {class_name} was expected."
86 )
ValueError: Received a XLMRobertaTokenizerFast for argument tokenizer, but a ('LayoutLMv3Tokenizer', 'LayoutLMv3TokenizerFast') was expected.
Any help is welcome :-) Thank you.
pierreguillou
changed discussion title from
How to use nielsr/lilt-xlm-roberta-base for inference with a a document image?
to How to use nielsr/lilt-xlm-roberta-base for inference with a document image?
nielsr
changed discussion status to
closed