Edit model card

Baby-Llama-58M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 6.7221

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00025
  • train_batch_size: 128
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 80
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
135.1538 1.0 8 118.8448
112.3406 2.0 16 102.1364
107.9124 3.0 24 86.8275
85.5837 4.0 32 71.8709
82.7059 5.0 40 60.4278
62.0973 6.0 48 51.7763
56.6325 7.0 56 44.4392
46.5864 8.0 64 39.5206
40.749 9.0 72 36.8323
34.1225 10.0 80 30.4178
26.3662 11.0 88 25.6518
21.4543 12.0 96 21.5034
17.4064 13.0 104 18.2917
14.5338 14.0 112 16.0543
12.8652 15.0 120 14.5666
11.1266 16.0 128 13.6536
9.5181 17.0 136 12.6228
8.0769 18.0 144 11.2297
7.3252 19.0 152 10.6871
6.7225 20.0 160 10.5576
6.1834 21.0 168 9.6600
6.0954 22.0 176 9.5832
5.715 23.0 184 9.4159
5.5297 24.0 192 8.8495
5.1538 25.0 200 8.6964
5.0472 26.0 208 8.4671
5.0581 27.0 216 8.3979
4.6914 28.0 224 8.2086
4.6117 29.0 232 8.2212
4.5157 30.0 240 8.1633
4.1918 31.0 248 8.1399
4.5274 32.0 256 7.7368
4.0493 33.0 264 7.7647
4.2799 34.0 272 7.8127
4.5331 35.0 280 7.6971
4.5937 36.0 288 7.6908
3.9957 37.0 296 7.6509
4.3035 38.0 304 7.5682
4.2626 39.0 312 7.4550
3.7238 40.0 320 7.4516
3.9562 41.0 328 7.2862
3.8612 42.0 336 7.3332
3.6178 43.0 344 7.3013
3.7672 44.0 352 7.2144
3.715 45.0 360 7.2103
3.7594 46.0 368 7.2457
4.3614 47.0 376 7.1274
4.0406 48.0 384 7.0472
3.5213 49.0 392 6.9963
3.7373 50.0 400 7.0503
3.7399 51.0 408 6.9916
3.8109 52.0 416 6.9899
3.3897 53.0 424 6.9132
3.2456 54.0 432 6.9393
3.8682 55.0 440 6.9017
3.3904 56.0 448 6.8995
3.8449 57.0 456 6.8478
3.6319 58.0 464 6.8388
3.4726 59.0 472 6.8123
3.5895 60.0 480 6.8452
3.4 61.0 488 6.7875
3.6904 62.0 496 6.7963
3.3957 63.0 504 6.7976
3.4602 64.0 512 6.8317
3.2714 65.0 520 6.8063
3.5695 66.0 528 6.7709
3.1538 67.0 536 6.7849
3.5586 68.0 544 6.7565
3.194 69.0 552 6.7629
3.0488 70.0 560 6.7462
3.6931 71.0 568 6.7269
3.7324 72.0 576 6.7367
3.2075 73.0 584 6.7460
3.3394 74.0 592 6.7111
3.4074 75.0 600 6.7456
3.3679 76.0 608 6.7225
3.2689 77.0 616 6.7234
3.6886 78.0 624 6.7247
3.4587 79.0 632 6.7224
3.6444 80.0 640 6.7221

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
13
Safetensors
Model size
54.5M params
Tensor type
F32
·