SaulLu commited on
Commit
43d8d4e
1 Parent(s): cd757fc

Add `tokenizer_class` to `config.4.13.0.json`

Browse files

Hi 😀!

I recently noticed that:
```python
from transformers import LayoutXLMProcessor

processor = LayoutXLMProcessor.from_pretrained("microsoft/layoutxlm-base")
```
was logging the following message
```
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LayoutLMv2Tokenizer'.
The class this function is called from is 'LayoutXLMTokenizerFast'.
```
and
```python
tokenizer = AutoTokenizer.from_pretrained("microsoft/layoutxlm-base")
print(type)
```
is printing
```
transformers.models.layoutlmv2.tokenization_layoutlmv2_fast.LayoutLMv2TokenizerFast
```

I think this is because the tokenizer class is not specified in the configuration file and therefore the default class determined is the one of the model, i.e. LayoutLMv2.

What do you think?

Files changed (1) hide show
  1. config.4.13.0.json +2 -1
config.4.13.0.json CHANGED
@@ -33,5 +33,6 @@
33
  "rel_pos_bins": 32,
34
  "transformers_version": "4.5.1",
35
  "type_vocab_size": 1,
36
- "vocab_size": 250002
 
37
  }
 
33
  "rel_pos_bins": 32,
34
  "transformers_version": "4.5.1",
35
  "type_vocab_size": 1,
36
+ "vocab_size": 250002,
37
+ "tokenizer_class": "LayoutXLMTokenizer"
38
  }