This autoregressive model belongs to a series of rather small language models trained on the BabyLM data:

  • the baby_llama model has few parameters and was trained on a small data set (10M tokens)
  • the teenie_llama model has the same number of parameters but was trained on more tokens of text (100M)
  • the weenie_llama model was trained on the small data set, but has more parameters/weights
  • the tweenie_llama model features both -- more tokens (the larger data set) and more weights (viz. parameters)
baby_llama teenie_llama weenie_llama tweenie_llama
Parameters 2.97M 2.97M 11.44M 11.44M
hidden layers 8 8 16 16
Attention heads 8 8 16 16
Embedding size 128 128 256 256
Context size 128 128 256 256
Vocab size 16k 16k 16k 16k

If you use this model in your research, please cite the following publication:

@inproceedings{bunzeck-zarriess-2024-fifty,
    title = "Fifty shapes of {BL}i{MP}: syntactic learning curves in language models are not uniform, but sometimes unruly",
    author = "Bunzeck, Bastian  and
      Zarrie{\ss}, Sina",
    editor = "Qiu, Amy  and
      Noble, Bill  and
      Pagmar, David  and
      Maraev, Vladislav  and
      Ilinykh, Nikolai",
    booktitle = "Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning",
    month = oct,
    year = "2024",
    address = "Gothenburg, Sweden",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.clasp-1.7",
    pages = "39--55",
}
Downloads last month
12
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train bbunzeck/teenie_llama

Collection including bbunzeck/teenie_llama