tweenie_llama / README.md
bbunzeck's picture
Update README.md
b18b303 verified
|
raw
history blame
1.41 kB
metadata
datasets:
  - nilq/babylm-100M
language:
  - en

This autoregressive model belongs to a series of rather small language models trained on the BabyLM data:

  • the baby_llama model has few parameters and was trained on a small data set (10M tokens)
  • the teenie_llama model has the same number of parameters but was trained on more tokens of text (100M)
  • the weenie_llama model was trained on the small data set, but has more parameters/weights
  • the tweenie_llama model features both -- more tokens (the larger data set) and more weights (viz. parameters)
baby_llama teenie_llama weenie_llama tweenie_llama
Parameters 2.97M 2.97M 11.44M 11.44M
hidden layers 8 8 16 16
Attention heads 8 8 16 16
Embedding size 128 128 256 256
Context size 128 128 256 256
Vocab size 16k 16k 16k 16k