AshtonIsNotHere
commited on
Commit
•
2a90b18
1
Parent(s):
4873553
Update README
Browse files
README.md
CHANGED
@@ -9,13 +9,13 @@ datasets:
|
|
9 |
- wikitext
|
10 |
---
|
11 |
|
12 |
-
## XLM-R Longformer Model
|
13 |
This is an XLM-RoBERTa longformer model that was pre-trained from the XLM-RoBERTa checkpoint using the Longformer [pre-training scheme](https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) on the English WikiText-103 corpus.
|
14 |
|
15 |
-
This model is identical to [markussagen's xlm-r longformer model,](https://huggingface.co/markussagen/xlm-roberta-longformer-base-4096) the difference being that the weights have been transferred to a Longformer model, in order to enable loading with ```AutoModel.from_pretrained()``` without
|
16 |
|
17 |
## Memory Requirements
|
18 |
-
Note that this model requires a considerable amount of memory to run. The heatmap below should give a relative idea of the amount of memory needed for a target batch and sequence length. N.B.
|
19 |
|
20 |
<details>
|
21 |
<summary>View Inference Memory Plot</summary>
|
|
|
9 |
- wikitext
|
10 |
---
|
11 |
|
12 |
+
## XLM-R Longformer Model
|
13 |
This is an XLM-RoBERTa longformer model that was pre-trained from the XLM-RoBERTa checkpoint using the Longformer [pre-training scheme](https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) on the English WikiText-103 corpus.
|
14 |
|
15 |
+
This model is identical to [markussagen's xlm-r longformer model,](https://huggingface.co/markussagen/xlm-roberta-longformer-base-4096) the difference being that the weights have been transferred to a Longformer model, in order to enable loading with ```AutoModel.from_pretrained()``` without external dependencies.
|
16 |
|
17 |
## Memory Requirements
|
18 |
+
Note that this model requires a considerable amount of memory to run. The heatmap below should give a relative idea of the amount of memory needed at inference for a target batch and sequence length. N.B. data for this plot was generated by running on a single a100 GPU with 40gb of memory.
|
19 |
|
20 |
<details>
|
21 |
<summary>View Inference Memory Plot</summary>
|