jon-tow's picture
Update README.md
95292b8
|
raw
history blame
694 Bytes
metadata
license: apache-2.0
datasets:
  - EleutherAI/the_pile_deduplicated
language:
  - en

Pythia-2.8B Deduped 4K is a Pythia-2.8B Deduped model fine-tuned with a 4096 context length. Training resumed from their 143,000 step checkpoint and continued on The Pile v1 Deduped (threshold=0.87). This particular model is from a checkpoint captured at step 175,500 for an extra 134,217,728,000 tokens of training.

Note: Sequence length warmup was not used to move up from 2048 but, in hindsight, should have been applied.

Acknoweldgements

This work would not have been possible without the support of Stability AI.