ccdv commited on
Commit
e81bbe8
1 Parent(s): 62b7c18
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -11,6 +11,8 @@ pipeline_tag: fill-mask
11
  **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
12
  **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
13
 
 
 
14
  * [Usage](#usage)
15
  * [Parameters](#parameters)
16
  * [Sparse selection type](#sparse-selection-type)
@@ -19,10 +21,9 @@ pipeline_tag: fill-mask
19
 
20
  This model is adapted from [BERT-base-uncased](https://huggingface.co/bert-base-uncased) without additional pretraining yet. It uses the same number of parameters/layers and the same tokenizer.
21
 
22
-
23
  This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
24
 
25
- The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...). \
26
 
27
  Support encoder-decoder but I didnt test it extensively.\
28
  Implemented in PyTorch.
 
11
  **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
12
  **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
13
 
14
+ Conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).
15
+
16
  * [Usage](#usage)
17
  * [Parameters](#parameters)
18
  * [Sparse selection type](#sparse-selection-type)
 
21
 
22
  This model is adapted from [BERT-base-uncased](https://huggingface.co/bert-base-uncased) without additional pretraining yet. It uses the same number of parameters/layers and the same tokenizer.
23
 
 
24
  This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
25
 
26
+ The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...).
27
 
28
  Support encoder-decoder but I didnt test it extensively.\
29
  Implemented in PyTorch.