Edit model card

results_2

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5776

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 1 5.3982
No log 2.0 2 4.8609
No log 3.0 3 4.7142
No log 4.0 4 4.6615
No log 5.0 5 4.6245
No log 6.0 6 4.5932
No log 7.0 7 4.5661
No log 8.0 8 4.5424
No log 9.0 9 4.5194
No log 10.0 10 4.4968
No log 11.0 11 4.4755
No log 12.0 12 4.4542
No log 13.0 13 4.4360
No log 14.0 14 4.4160
No log 15.0 15 4.3955
No log 16.0 16 4.3757
No log 17.0 17 4.3547
No log 18.0 18 4.3332
No log 19.0 19 4.3108
No log 20.0 20 4.2878
No log 21.0 21 4.2635
No log 22.0 22 4.2386
No log 23.0 23 4.2121
No log 24.0 24 4.1849
No log 25.0 25 4.1581
No log 26.0 26 4.1309
No log 27.0 27 4.1021
No log 28.0 28 4.0720
No log 29.0 29 4.0414
No log 30.0 30 4.0136
No log 31.0 31 3.9845
No log 32.0 32 3.9580
No log 33.0 33 3.9301
No log 34.0 34 3.9012
No log 35.0 35 3.8716
No log 36.0 36 3.8444
No log 37.0 37 3.8159
No log 38.0 38 3.7859
No log 39.0 39 3.7543
No log 40.0 40 3.7212
No log 41.0 41 3.6875
No log 42.0 42 3.6533
No log 43.0 43 3.6191
No log 44.0 44 3.5842
No log 45.0 45 3.5481
No log 46.0 46 3.5108
No log 47.0 47 3.4728
No log 48.0 48 3.4346
No log 49.0 49 3.4008
No log 50.0 50 3.3707
No log 51.0 51 3.3395
No log 52.0 52 3.3093
No log 53.0 53 3.2779
No log 54.0 54 3.2470
No log 55.0 55 3.2158
No log 56.0 56 3.1847
No log 57.0 57 3.1528
No log 58.0 58 3.1244
No log 59.0 59 3.0970
No log 60.0 60 3.0729
No log 61.0 61 3.0486
No log 62.0 62 3.0261
No log 63.0 63 3.0033
No log 64.0 64 2.9807
No log 65.0 65 2.9575
No log 66.0 66 2.9348
No log 67.0 67 2.9117
No log 68.0 68 2.8894
No log 69.0 69 2.8685
No log 70.0 70 2.8501
No log 71.0 71 2.8357
No log 72.0 72 2.8204
No log 73.0 73 2.8070
No log 74.0 74 2.7933
No log 75.0 75 2.7819
No log 76.0 76 2.7697
No log 77.0 77 2.7569
No log 78.0 78 2.7427
No log 79.0 79 2.7304
No log 80.0 80 2.7188
No log 81.0 81 2.7096
No log 82.0 82 2.6990
No log 83.0 83 2.6880
No log 84.0 84 2.6766
No log 85.0 85 2.6658
No log 86.0 86 2.6550
No log 87.0 87 2.6444
No log 88.0 88 2.6353
No log 89.0 89 2.6272
No log 90.0 90 2.6194
No log 91.0 91 2.6118
No log 92.0 92 2.6052
No log 93.0 93 2.5996
No log 94.0 94 2.5947
No log 95.0 95 2.5901
No log 96.0 96 2.5860
No log 97.0 97 2.5828
No log 98.0 98 2.5803
No log 99.0 99 2.5785
No log 100.0 100 2.5776

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
6
Safetensors
Model size
248M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.