Baby-Llama-58M
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.7109
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.00025
- train_batch_size: 128
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 80
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
311.1646 | 1.0 | 3 | 287.5772 |
309.9048 | 2.0 | 6 | 282.5104 |
295.7833 | 3.0 | 9 | 266.8010 |
269.5852 | 4.0 | 12 | 247.3416 |
250.6772 | 5.0 | 15 | 231.4105 |
243.0754 | 6.0 | 18 | 224.6885 |
235.779 | 7.0 | 21 | 217.7554 |
235.8358 | 8.0 | 24 | 211.6984 |
224.1199 | 9.0 | 27 | 204.9522 |
216.0247 | 10.0 | 30 | 197.5209 |
206.4354 | 11.0 | 33 | 189.5172 |
189.1456 | 12.0 | 36 | 179.2765 |
181.0333 | 13.0 | 39 | 157.3401 |
152.062 | 14.0 | 42 | 137.4234 |
132.3128 | 15.0 | 45 | 120.5469 |
118.0474 | 16.0 | 48 | 106.6884 |
107.6354 | 17.0 | 51 | 97.7495 |
98.2458 | 18.0 | 54 | 88.4898 |
86.4009 | 19.0 | 57 | 77.8249 |
75.9386 | 20.0 | 60 | 67.9337 |
65.627 | 21.0 | 63 | 58.1877 |
53.5903 | 22.0 | 66 | 49.0234 |
47.114 | 23.0 | 69 | 41.2838 |
38.9667 | 24.0 | 72 | 34.4503 |
32.8846 | 25.0 | 75 | 29.7438 |
27.1886 | 26.0 | 78 | 24.2863 |
23.0713 | 27.0 | 81 | 20.1505 |
18.9003 | 28.0 | 84 | 16.9556 |
15.9133 | 29.0 | 87 | 14.4738 |
13.5544 | 30.0 | 90 | 12.6399 |
11.6834 | 31.0 | 93 | 11.1016 |
10.2371 | 32.0 | 96 | 9.9052 |
9.2371 | 33.0 | 99 | 8.9413 |
8.352 | 34.0 | 102 | 8.1600 |
7.5322 | 35.0 | 105 | 7.6794 |
7.0653 | 36.0 | 108 | 7.3031 |
6.6853 | 37.0 | 111 | 6.9564 |
6.3257 | 38.0 | 114 | 6.7247 |
5.9869 | 39.0 | 117 | 6.4649 |
5.8618 | 40.0 | 120 | 6.2734 |
5.6025 | 41.0 | 123 | 6.1253 |
5.4913 | 42.0 | 126 | 6.0822 |
5.3086 | 43.0 | 129 | 5.8575 |
5.1904 | 44.0 | 132 | 5.6860 |
5.1193 | 45.0 | 135 | 5.6821 |
5.0846 | 46.0 | 138 | 5.5831 |
5.017 | 47.0 | 141 | 5.5245 |
4.7435 | 48.0 | 144 | 5.3877 |
4.7546 | 49.0 | 147 | 5.3523 |
4.8606 | 50.0 | 150 | 5.3845 |
4.7146 | 51.0 | 153 | 5.2239 |
4.6273 | 52.0 | 156 | 5.1927 |
4.4469 | 53.0 | 159 | 5.1898 |
4.5135 | 54.0 | 162 | 5.0846 |
4.4061 | 55.0 | 165 | 5.0756 |
4.3577 | 56.0 | 168 | 5.0474 |
4.2169 | 57.0 | 171 | 5.0125 |
4.3001 | 58.0 | 174 | 4.9770 |
4.2399 | 59.0 | 177 | 4.9469 |
4.3372 | 60.0 | 180 | 4.9162 |
4.2669 | 61.0 | 183 | 4.9166 |
4.2394 | 62.0 | 186 | 4.8618 |
4.2965 | 63.0 | 189 | 4.8595 |
4.1188 | 64.0 | 192 | 4.8285 |
4.2886 | 65.0 | 195 | 4.8265 |
4.2688 | 66.0 | 198 | 4.8103 |
4.2429 | 67.0 | 201 | 4.7904 |
3.9653 | 68.0 | 204 | 4.7787 |
4.2676 | 69.0 | 207 | 4.7604 |
4.2029 | 70.0 | 210 | 4.7588 |
4.0962 | 71.0 | 213 | 4.7560 |
4.0643 | 72.0 | 216 | 4.7449 |
4.0713 | 73.0 | 219 | 4.7341 |
4.1192 | 74.0 | 222 | 4.7275 |
4.135 | 75.0 | 225 | 4.7186 |
3.9914 | 76.0 | 228 | 4.7135 |
4.0225 | 77.0 | 231 | 4.7144 |
3.9907 | 78.0 | 234 | 4.7152 |
4.0444 | 79.0 | 237 | 4.7123 |
4.0321 | 80.0 | 240 | 4.7109 |
Framework versions
- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0
- Downloads last month
- 8