JonasGeiping commited on
Commit
3e6db37
1 Parent(s): 440a8a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -15,11 +15,11 @@ tags:
15
 
16
 
17
 
18
- # crammed BERT
19
 
20
- This is one of the final models described in "Cramming: Training a Language Model on a Single GPU in One Day". This is an *English*-language model pretrained like BERT, but with less compute. This one was trained for 24 hours on a single A6000 GPU. To use this model, you need the code from the repo at https://github.com/JonasGeiping/cramming.
21
 
22
- You can find the paper here: https://arxiv.org/abs/2212.14034, and the abstract below:
23
 
24
  > Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question:
25
  How far can we get with a single GPU in just one day?
 
15
 
16
 
17
 
18
+ # crammed BERT (legacy/v1)
19
 
20
+ This is one of the final models described in the **FIRST VERSION OF** "Cramming: Training a Language Model on a Single GPU in One Day". This is an *English*-language model pretrained like BERT, but with less compute. This one was trained for 24 hours on a single A6000 GPU. To use this model, you need the code from the repo at https://github.com/JonasGeiping/cramming.
21
 
22
+ You can find the paper here (linked to the old version on arxiv): https://arxiv.org/abs/2212.14034/v1, and the abstract below:
23
 
24
  > Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question:
25
  How far can we get with a single GPU in just one day?