File size: 2,185 Bytes
3bd40ab b18b303 31b1001 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
datasets:
- nilq/babylm-100M
language:
- en
---
This autoregressive model belongs to a series of rather small language models trained on the [BabyLM](https://babylm.github) data:
- the [baby_llama](https://huggingface.co/bbunzeck/baby_llama) model has few parameters and was trained on a small data set (10M tokens)
- the [**t**eenie_llama](https://huggingface.co/bbunzeck/teenie_llama) model has the same number of parameters but was trained on more **t**okens of text (100M)
- the [**w**eenie_llama](https://huggingface.co/bbunzeck/weenie_llama) model was trained on the small data set, but has more parameters/**w**eights
- the [**tw**eenie_llama](https://huggingface.co/bbunzeck/tweenie_llama) model features both -- more **t**okens (the larger data set) and more **w**eights (*viz.* parameters)
| | baby_llama | teenie_llama | weenie_llama | tweenie_llama |
|-----------------|-----------|-------------|-------------|--------------|
| Parameters | 2.97M | 2.97M | 11.44M | 11.44M |
| hidden layers | 8 | 8 | 16 | 16 |
| Attention heads | 8 | 8 | 16 | 16 |
| Embedding size | 128 | 128 | 256 | 256 |
| Context size | 128 | 128 | 256 | 256 |
| Vocab size | 16k | 16k | 16k | 16k |
If you use this model in your research, please cite the following publication:
```
@inproceedings{bunzeck-zarriess-2024-fifty,
title = "Fifty shapes of {BL}i{MP}: syntactic learning curves in language models are not uniform, but sometimes unruly",
author = "Bunzeck, Bastian and
Zarrie{\ss}, Sina",
editor = "Qiu, Amy and
Noble, Bill and
Pagmar, David and
Maraev, Vladislav and
Ilinykh, Nikolai",
booktitle = "Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning",
month = oct,
year = "2024",
address = "Gothenburg, Sweden",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.clasp-1.7",
pages = "39--55",
}
``` |