ccdv commited on
Commit
69c632d
1 Parent(s): a68bf66
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -11,7 +11,8 @@ pipeline_tag: fill-mask
11
  **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
12
  **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
13
 
14
- Conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).
 
15
 
16
  * [Usage](#usage)
17
  * [Parameters](#parameters)
@@ -25,7 +26,7 @@ This model is a small version of the [LEGAL-BERT](https://huggingface.co/nlpaueb
25
  This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
26
 
27
 
28
- The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...). \
29
 
30
 
31
  Support encoder-decoder but I didnt test it extensively.\
 
11
  **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
12
  **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
13
 
14
+ LSG ArXiv [paper](https://arxiv.org/abs/2210.15497). \
15
+ Github/conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).
16
 
17
  * [Usage](#usage)
18
  * [Parameters](#parameters)
 
26
  This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
27
 
28
 
29
+ The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...).
30
 
31
 
32
  Support encoder-decoder but I didnt test it extensively.\