Edit model card

Baby-Llama-58M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.7109

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00025
  • train_batch_size: 128
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 80
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
311.1646 1.0 3 287.5772
309.9048 2.0 6 282.5104
295.7833 3.0 9 266.8010
269.5852 4.0 12 247.3416
250.6772 5.0 15 231.4105
243.0754 6.0 18 224.6885
235.779 7.0 21 217.7554
235.8358 8.0 24 211.6984
224.1199 9.0 27 204.9522
216.0247 10.0 30 197.5209
206.4354 11.0 33 189.5172
189.1456 12.0 36 179.2765
181.0333 13.0 39 157.3401
152.062 14.0 42 137.4234
132.3128 15.0 45 120.5469
118.0474 16.0 48 106.6884
107.6354 17.0 51 97.7495
98.2458 18.0 54 88.4898
86.4009 19.0 57 77.8249
75.9386 20.0 60 67.9337
65.627 21.0 63 58.1877
53.5903 22.0 66 49.0234
47.114 23.0 69 41.2838
38.9667 24.0 72 34.4503
32.8846 25.0 75 29.7438
27.1886 26.0 78 24.2863
23.0713 27.0 81 20.1505
18.9003 28.0 84 16.9556
15.9133 29.0 87 14.4738
13.5544 30.0 90 12.6399
11.6834 31.0 93 11.1016
10.2371 32.0 96 9.9052
9.2371 33.0 99 8.9413
8.352 34.0 102 8.1600
7.5322 35.0 105 7.6794
7.0653 36.0 108 7.3031
6.6853 37.0 111 6.9564
6.3257 38.0 114 6.7247
5.9869 39.0 117 6.4649
5.8618 40.0 120 6.2734
5.6025 41.0 123 6.1253
5.4913 42.0 126 6.0822
5.3086 43.0 129 5.8575
5.1904 44.0 132 5.6860
5.1193 45.0 135 5.6821
5.0846 46.0 138 5.5831
5.017 47.0 141 5.5245
4.7435 48.0 144 5.3877
4.7546 49.0 147 5.3523
4.8606 50.0 150 5.3845
4.7146 51.0 153 5.2239
4.6273 52.0 156 5.1927
4.4469 53.0 159 5.1898
4.5135 54.0 162 5.0846
4.4061 55.0 165 5.0756
4.3577 56.0 168 5.0474
4.2169 57.0 171 5.0125
4.3001 58.0 174 4.9770
4.2399 59.0 177 4.9469
4.3372 60.0 180 4.9162
4.2669 61.0 183 4.9166
4.2394 62.0 186 4.8618
4.2965 63.0 189 4.8595
4.1188 64.0 192 4.8285
4.2886 65.0 195 4.8265
4.2688 66.0 198 4.8103
4.2429 67.0 201 4.7904
3.9653 68.0 204 4.7787
4.2676 69.0 207 4.7604
4.2029 70.0 210 4.7588
4.0962 71.0 213 4.7560
4.0643 72.0 216 4.7449
4.0713 73.0 219 4.7341
4.1192 74.0 222 4.7275
4.135 75.0 225 4.7186
3.9914 76.0 228 4.7135
4.0225 77.0 231 4.7144
3.9907 78.0 234 4.7152
4.0444 79.0 237 4.7123
4.0321 80.0 240 4.7109

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
8
Safetensors
Model size
46.5M params
Tensor type
F32
·