Edit model card

NGME-LLama 264M

  • Trained on 4 A6000 for ~4 days
  • Trained ~4 Billion (4 * 16 * 768 * 100_000) Tokens
  • On C4 Corpus
Downloads last month
6

Dataset used to train PatrickHaller/ngme-llama-264M