Baby Llama
Our submission to the strict-small
track of the BabyLM challenge.
Baby Llama is a 58M-parameter model, distilled from an ensemble consisting of LLaMA-360M and GPT2-705M, both trained on the babylm_10M
dataset.
See the associated paper for a detailed discussion of the training procedure and of the model performance. The training code is available at https://github.com/timinar/BabyLlama.
Hyperparameters for the tasks that require fine-tuning
When evaluating the model on the tasks that require fine-tuning, we noticed that the default hyperparameters suggested by the BabyLM organizers lead to severe overfitting in a number of tasks. To avoid this issue, we have re-tuned those hyperparameters. The sets of hyperparameters selected for each task are listed in the table below.
Task | Maximum learning rate | Batch size | Maximum epochs | Patience | Evaluate every (steps) | Random seed |
---|---|---|---|---|---|---|
CoLA | 4e-5 | 64 | 3 | 10 | 20 | 12 |
SST-2 | 5e-5 | 64 | 6 | 10 | 200 | 12 |
MRPC | 3e-5 | 64 | 3 | 10 | 20 | 12 |
QQP | 4e-5 | 64 | 10 | 10 | 1000 | 12 |
MNLI | 5e-5 | 64 | 6 | 10 | 200 | 12 |
MNLI-mm | 5e-5 | 64 | 6 | 10 | 200 | 12 |
QNLI | 5e-5 | 64 | 6 | 10 | 200 | 12 |
RTE | 5e-5 | 64 | 6 | 10 | 200 | 12 |
BoolQ | 3e-4 | 16 | 10 | 10 | 10 | 12 |
MultiRC | 1e-4 | 64 | 7 | 10 | 1000 | 42 |
WSC | 5e-7 | 1 | 10 | 1000 | 2000 | 12 |
CR (Control) | 5e-5 | 64 | 10 | 10 | 100 | 12 |
LC (Control) | 1e-3 | 64 | 1 | 2 | 10 | 12 |
MV (Control) | 5e-5 | 64 | 6 | 10 | 200 | 12 |
RP (Control) | 1e-3 | 64 | 1 | 10 | 10 | 12 |
SC (Control) | 1e-3 | 64 | 2 | 10 | 10 | 12 |
CR_LC | 1e-3 | 64 | 2 | 10 | 10 | 12 |
CR_RTP | 5e-5 | 64 | 6 | 10 | 200 | 12 |
MV_LC | 5e-5 | 64 | 6 | 10 | 200 | 12 |
MV_RTP | 5e-5 | 64 | 6 | 10 | 200 | 12 |
SC_LC | 1e-3 | 64 | 2 | 10 | 10 | 12 |
SC_RP | 1e-3 | 64 | 2 | 10 | 10 | 12 |
- Downloads last month
- 247
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.