Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ Our submission to the `strict-small` track of the [BabyLM challenge](https://bab
|
|
10 |
|
11 |
Baby Llama is a 58M-parameter model, distilled from an ensemble consisting of LLaMA-360M and GPT2-705M, both trained on the `babylm_10M` dataset.
|
12 |
|
13 |
-
See the associated paper
|
14 |
The training code is available at [https://github.com/timinar/BabyLlama](https://github.com/timinar/BabyLlama).
|
15 |
|
16 |
### Hyperparameters for the tasks that require fine-tuning
|
|
|
10 |
|
11 |
Baby Llama is a 58M-parameter model, distilled from an ensemble consisting of LLaMA-360M and GPT2-705M, both trained on the `babylm_10M` dataset.
|
12 |
|
13 |
+
See the associated [paper](https://arxiv.org/abs/2308.02019) for a detailed discussion of the training procedure and of the model performance.
|
14 |
The training code is available at [https://github.com/timinar/BabyLlama](https://github.com/timinar/BabyLlama).
|
15 |
|
16 |
### Hyperparameters for the tasks that require fine-tuning
|