README.md · CarperAI/pythia-2.8b-deduped-4k at 4b2a54d8bea04b1146fa52335e17ea3abd2b19b5

metadata

license: apache-2.0
datasets:
  - EleutherAI/the_pile_deduplicated
language:
  - en

Pythia-2.8B Deduped 4K is a Pythia-2.8B Deduped model fine-tuned with a 4096 context length. Training resumed from their 143,000 step checkpoint and continued on The Pile v1 Deduped (threshold=0.87). This particular model is from a checkpoint captured at step 175,500 for an extra 134,217,728,000 tokens of training.

Note: Sequence length warmup was not used to move up from 2048 but, in hindsight, should have been applied.

Acknoweldgements

This work would not have been possible without the support of Stability AI.