AshtonIsNotHere commited on
Commit
2a90b18
1 Parent(s): 4873553

Update README

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -9,13 +9,13 @@ datasets:
9
  - wikitext
10
  ---
11
 
12
- ## XLM-R Longformer Model / XLM-Long
13
  This is an XLM-RoBERTa longformer model that was pre-trained from the XLM-RoBERTa checkpoint using the Longformer [pre-training scheme](https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) on the English WikiText-103 corpus.
14
 
15
- This model is identical to [markussagen's xlm-r longformer model,](https://huggingface.co/markussagen/xlm-roberta-longformer-base-4096) the difference being that the weights have been transferred to a Longformer model, in order to enable loading with ```AutoModel.from_pretrained()``` without the need for external libraries.
16
 
17
  ## Memory Requirements
18
- Note that this model requires a considerable amount of memory to run. The heatmap below should give a relative idea of the amount of memory needed for a target batch and sequence length. N.B. Data for this plot was generated by running on a single a100 GPU with 40gb of memory.
19
 
20
  <details>
21
  <summary>View Inference Memory Plot</summary>
 
9
  - wikitext
10
  ---
11
 
12
+ ## XLM-R Longformer Model
13
  This is an XLM-RoBERTa longformer model that was pre-trained from the XLM-RoBERTa checkpoint using the Longformer [pre-training scheme](https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) on the English WikiText-103 corpus.
14
 
15
+ This model is identical to [markussagen's xlm-r longformer model,](https://huggingface.co/markussagen/xlm-roberta-longformer-base-4096) the difference being that the weights have been transferred to a Longformer model, in order to enable loading with ```AutoModel.from_pretrained()``` without external dependencies.
16
 
17
  ## Memory Requirements
18
+ Note that this model requires a considerable amount of memory to run. The heatmap below should give a relative idea of the amount of memory needed at inference for a target batch and sequence length. N.B. data for this plot was generated by running on a single a100 GPU with 40gb of memory.
19
 
20
  <details>
21
  <summary>View Inference Memory Plot</summary>