File size: 1,641 Bytes
dce4e5f
 
 
 
 
 
 
 
871ed9c
 
dce4e5f
 
 
 
6834850
dce4e5f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8575dfa
dce4e5f
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
language:
- es
- fr
- ru
- en
- it
tags:
- token-classification
- fill-mask
license: mit
datasets:
- iit-cdip
---


This model is the pretrained infoxlm checkpoint from the paper "LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding".

 Original repository: https://github.com/jpWang/LiLT
 
To use it, it is necessary to fork the modeling and configuration files from the original repository, and load the pretrained model from the corresponding classes (LiLTRobertaLikeConfig, LiLTRobertaLikeForRelationExtraction, LiLTRobertaLikeForTokenClassification, LiLTRobertaLikeModel).
They can also be preloaded with the AutoConfig/model factories as such:

```python
from transformers import AutoModelForTokenClassification, AutoConfig

from path_to_custom_classes import (
    LiLTRobertaLikeConfig,
    LiLTRobertaLikeForRelationExtraction,
    LiLTRobertaLikeForTokenClassification,
    LiLTRobertaLikeModel
    )


def patch_transformers():
    AutoConfig.register("liltrobertalike", LiLTRobertaLikeConfig)
    AutoModel.register(LiLTRobertaLikeConfig, LiLTRobertaLikeModel)
    AutoModelForTokenClassification.register(LiLTRobertaLikeConfig, LiLTRobertaLikeForTokenClassification)
    # etc...
 ```
 
 To load the model, it is then possible to use:
 ```python
 # patch_transformers() must have been executed beforehand

tokenizer = AutoTokenizer.from_pretrained("microsoft/infoxlm-base")
model = AutoModel.from_pretrained("manu/lilt-infoxlm-base")
model = AutoModelForTokenClassification.from_pretrained("manu/lilt-infoxlm-base") # to be fine-tuned on a token classification task
 ```