Edit model card

This autoregressive model belongs to a series of rather small language models trained on the BabyLM data:

  • the baby_llama model has few parameters and was trained on a small data set (10M tokens)
  • the teenie_llama model has the same number of parameters but was trained on more tokens of text (100M)
  • the weenie_llama model was trained on the small data set, but has more parameters/weights
  • the tweenie_llama model features both -- more tokens (the larger data set) and more weights (viz. parameters)
baby_llama teenie_llama weenie_llama tweenie_llama
Parameters 2.97M 2.97M 11.44M 11.44M
hidden layers 8 8 16 16
Attention heads 8 8 16 16
Embedding size 128 128 256 256
Context size 128 128 256 256
Vocab size 16k 16k 16k 16k
Downloads last month
2
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train bbunzeck/weenie_llama

Collection including bbunzeck/weenie_llama