manu
/

lilt-infoxlm-base

Token Classification

liltrobertalike

Inference Endpoints

Model card Files Files and versions Community

manu commited on Mar 30, 2022

Commit

dce4e5f

•

1 Parent(s): ccd3e2d

Update README.md

Files changed (1) hide show

README.md +47 -2

README.md CHANGED Viewed

@@ -1,3 +1,48 @@
-The models are from the paper "LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding".
- Original repository: https://github.com/jpWang/LiLT

+---
+language:
+- es
+- fr
+- ru
+- en
+- it
+tags:
+- translation
+license: mit
+datasets:
+- iit-cdip
+---
+This model is the pretrained infoxlm checkpoint from the paper "LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding".
+ Original repository: https://github.com/jpWang/LiLT
+To use it, it is necessary to fork the modeling and configuration files from the original repository, and load the pretrained model from the corresponding classes (LiLTRobertaLikeConfig, LiLTRobertaLikeForRelationExtraction, LiLTRobertaLikeForTokenClassification, LiLTRobertaLikeModel).
+They can also be preloaded with the AutoConfig/model factories as such:
+```python
+from transformers import AutoModelForTokenClassification, AutoConfig
+from path_to_custom_classes import (
+    LiLTRobertaLikeConfig,
+    LiLTRobertaLikeForRelationExtraction,
+    LiLTRobertaLikeForTokenClassification,
+    LiLTRobertaLikeModel
+    )
+def patch_transformers():
+    AutoConfig.register("liltrobertalike", LiLTRobertaLikeConfig)
+    AutoModel.register(LiLTRobertaLikeConfig, LiLTRobertaLikeModel)
+    AutoModelForTokenClassification.register(LiLTRobertaLikeConfig, LiLTRobertaLikeForTokenClassification)
+    # etc...
+ ```
+ To load the model, it is then possible to use:
+ ```python
+ # patch_transformers() must have been executed beforehand
+tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, use_auth_token=self.use_auth_token)
+model = AutoModel.from_pretrained("manu/lilt-infoxlm-base")
+model = AutoModelForTokenClassification.from_pretrained("manu/lilt-infoxlm-base") # to be fine-tuned on a token classification task
+ ```