manu commited on
Commit
dce4e5f
1 Parent(s): ccd3e2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -2
README.md CHANGED
@@ -1,3 +1,48 @@
1
- The models are from the paper "LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding".
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- Original repository: https://github.com/jpWang/LiLT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - es
4
+ - fr
5
+ - ru
6
+ - en
7
+ - it
8
+ tags:
9
+ - translation
10
+ license: mit
11
+ datasets:
12
+ - iit-cdip
13
+ ---
14
 
15
+
16
+ This model is the pretrained infoxlm checkpoint from the paper "LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding".
17
+
18
+ Original repository: https://github.com/jpWang/LiLT
19
+
20
+ To use it, it is necessary to fork the modeling and configuration files from the original repository, and load the pretrained model from the corresponding classes (LiLTRobertaLikeConfig, LiLTRobertaLikeForRelationExtraction, LiLTRobertaLikeForTokenClassification, LiLTRobertaLikeModel).
21
+ They can also be preloaded with the AutoConfig/model factories as such:
22
+
23
+ ```python
24
+ from transformers import AutoModelForTokenClassification, AutoConfig
25
+
26
+ from path_to_custom_classes import (
27
+ LiLTRobertaLikeConfig,
28
+ LiLTRobertaLikeForRelationExtraction,
29
+ LiLTRobertaLikeForTokenClassification,
30
+ LiLTRobertaLikeModel
31
+ )
32
+
33
+
34
+ def patch_transformers():
35
+ AutoConfig.register("liltrobertalike", LiLTRobertaLikeConfig)
36
+ AutoModel.register(LiLTRobertaLikeConfig, LiLTRobertaLikeModel)
37
+ AutoModelForTokenClassification.register(LiLTRobertaLikeConfig, LiLTRobertaLikeForTokenClassification)
38
+ # etc...
39
+ ```
40
+
41
+ To load the model, it is then possible to use:
42
+ ```python
43
+ # patch_transformers() must have been executed beforehand
44
+
45
+ tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, use_auth_token=self.use_auth_token)
46
+ model = AutoModel.from_pretrained("manu/lilt-infoxlm-base")
47
+ model = AutoModelForTokenClassification.from_pretrained("manu/lilt-infoxlm-base") # to be fine-tuned on a token classification task
48
+ ```