Fifty shapes of BLiMP: syntactic learning curves in LMs
Collection
Models analyzed in our 2024 MILLing paper
•
4 items
•
Updated
This autoregressive model belongs to a series of rather small language models trained on the BabyLM data:
baby_llama | teenie_llama | weenie_llama | tweenie_llama | |
---|---|---|---|---|
Parameters | 2.97M | 2.97M | 11.44M | 11.44M |
hidden layers | 8 | 8 | 16 | 16 |
Attention heads | 8 | 8 | 16 | 16 |
Embedding size | 128 | 128 | 256 | 256 |
Context size | 128 | 128 | 256 | 256 |
Vocab size | 16k | 16k | 16k | 16k |