jacobfulano commited on
Commit
97a8dec
1 Parent(s): 9e9fe1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -11,16 +11,18 @@ inference: false
11
 
12
  MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
13
  MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
14
- Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased).
 
15
 
16
- __This model was trained with [ALiBi](https://arxiv.org/abs/2108.12409) on a sequence length of 512 tokens.__
17
 
18
  ALiBi allows a model trained with a sequence length n to easily extrapolate to sequence lengths >2n during finetuning. For more details, see [Train Short, Test Long: Attention with Linear
19
  Biases Enables Input Length Extrapolation (Press et al. 2022)](https://arxiv.org/abs/2108.12409)
20
 
21
- It is part of the **family of MosaicBERT-Base models**:
22
 
23
  * [mosaic-bert-base](https://huggingface.co/mosaicml/mosaic-bert-base) (trained on a sequence length of 128 tokens)
 
24
  * mosaic-bert-base-seqlen-512
25
  * [mosaic-bert-base-seqlen-1024](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-1024)
26
  * [mosaic-bert-base-seqlen-2048](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-2048)
 
11
 
12
  MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
13
  MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
14
+ Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased). It incorporates efficiency insights
15
+ from the past half a decade of transformers research, from RoBERTa to T5 and GPT.
16
 
17
+ __This particular model was trained with [ALiBi](https://arxiv.org/abs/2108.12409) on a sequence length of 512 tokens.__
18
 
19
  ALiBi allows a model trained with a sequence length n to easily extrapolate to sequence lengths >2n during finetuning. For more details, see [Train Short, Test Long: Attention with Linear
20
  Biases Enables Input Length Extrapolation (Press et al. 2022)](https://arxiv.org/abs/2108.12409)
21
 
22
+ It is part of the **family of MosaicBERT-Base models** trained using ALiBi on different sequence lengths:
23
 
24
  * [mosaic-bert-base](https://huggingface.co/mosaicml/mosaic-bert-base) (trained on a sequence length of 128 tokens)
25
+ * [mosaic-bert-base-seqlen-256](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-256)
26
  * mosaic-bert-base-seqlen-512
27
  * [mosaic-bert-base-seqlen-1024](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-1024)
28
  * [mosaic-bert-base-seqlen-2048](https://huggingface.co/mosaicml/mosaic-bert-base-seqlen-2048)