This autoregressive model belongs to a series of rather small language models trained on the BabyLM data:

  • the baby_llama model has few parameters and was trained on a small data set (10M tokens)
  • the teenie_llama model has the same number of parameters but was trained on more tokens of text (100M)
  • the weenie_llama model was trained on the small data set, but has more parameters/weights
  • the tweenie_llama model features both -- more tokens (the larger data set) and more weights (viz. parameters)
baby_llama teenie_llama weenie_llama tweenie_llama
Parameters 2.97M 2.97M 11.44M 11.44M
hidden layers 8 8 16 16
Attention heads 8 8 16 16
Embedding size 128 128 256 256
Context size 128 128 256 256
Vocab size 16k 16k 16k 16k

If you use this model in your research, please cite the following publication:

@inproceedings{bunzeck-zarriess-2024-fifty,
    title = "Fifty shapes of {BL}i{MP}: syntactic learning curves in language models are not uniform, but sometimes unruly",
    author = "Bunzeck, Bastian  and
      Zarrie{\ss}, Sina",
    editor = "Qiu, Amy  and
      Noble, Bill  and
      Pagmar, David  and
      Maraev, Vladislav  and
      Ilinykh, Nikolai",
    booktitle = "Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning",
    month = oct,
    year = "2024",
    address = "Gothenburg, Sweden",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.clasp-1.7",
    pages = "39--55",
}
Downloads last month
134
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train bbunzeck/tweenie_llama

Collection including bbunzeck/tweenie_llama