sam-mosaic commited on
Commit
8e8e031
1 Parent(s): df9df58

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ inference: false
14
  MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths.
15
  It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the [books3 dataset](https://huggingface.co/datasets/the_pile_books3).
16
  At inference time, thanks to [ALiBi](https://arxiv.org/abs/2108.12409), MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens.
17
- We demonstrate generations as long as 84k tokens on a single node of 8 A100-80GB GPUs in our [blogpost](www.mosaicml.com/blog/mpt-7b).
18
  * License: _Apache-2.0_ (commercial use permitted)
19
 
20
  This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
 
14
  MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths.
15
  It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the [books3 dataset](https://huggingface.co/datasets/the_pile_books3).
16
  At inference time, thanks to [ALiBi](https://arxiv.org/abs/2108.12409), MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens.
17
+ We demonstrate generations as long as 84k tokens on a single node of 8 A100-80GB GPUs in our [blogpost](https://www.mosaicml.com/blog/mpt-7b).
18
  * License: _Apache-2.0_ (commercial use permitted)
19
 
20
  This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.