metadata
datasets:
- nilq/babylm-100M
language:
- en
This autoregressive model belongs to a series of rather small language models trained on the BabyLM data:
- the baby_llama model has few parameters and was trained on a small data set (10M tokens)
- the teenie_llama model has the same number of parameters but was trained on more tokens of text (100M)
- the weenie_llama model was trained on the small data set, but has more parameters/weights
- the tweenie_llama model features both -- more tokens (the larger data set) and more weights (viz. parameters)
baby_llama | teenie_llama | weenie_llama | tweenie_llama | |
---|---|---|---|---|
Parameters | 2.97M | 2.97M | 11.44M | 11.44M |
hidden layers | 8 | 8 | 16 | 16 |
Attention heads | 8 | 8 | 16 | 16 |
Embedding size | 128 | 128 | 256 | 256 |
Context size | 128 | 128 | 256 | 256 |
Vocab size | 16k | 16k | 16k | 16k |