Edit model card

lc_repeat_unk_as_pad_token

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0279

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss
1.4374 1.0 90 1.4046
1.2546 2.0 180 1.3617
1.3413 3.0 270 1.3468
1.2581 4.0 360 1.3482
1.2393 5.0 450 1.3575
1.1919 6.0 540 1.3651
1.1748 7.0 630 1.3817
1.0457 8.0 720 1.4023
0.986 9.0 810 1.4315
1.1143 10.0 900 1.4428
0.9689 11.0 990 1.4557
0.9452 12.0 1080 1.5132
0.9422 13.0 1170 1.5219
1.0653 14.0 1260 1.5547
1.0175 15.0 1350 1.6080
0.8469 16.0 1440 1.6143
0.8679 17.0 1530 1.6090
0.8854 18.0 1620 1.6813
0.789 19.0 1710 1.7044
0.8143 20.0 1800 1.7180
0.6346 21.0 1890 1.7688
0.509 22.0 1980 1.8638
0.7174 23.0 2070 1.8140
0.6996 24.0 2160 1.8360
0.5699 25.0 2250 1.8854
0.667 26.0 2340 1.9072
0.657 27.0 2430 1.9244
0.5345 28.0 2520 1.9389
0.5607 29.0 2610 1.9305
0.5528 30.0 2700 1.9472
0.5348 31.0 2790 1.9645
0.5597 32.0 2880 1.9925
0.4532 33.0 2970 1.9802
0.5005 34.0 3060 1.9918
0.6786 35.0 3150 2.0009
0.7536 36.0 3240 2.0051
0.6406 37.0 3330 2.0132
0.5569 38.0 3420 2.0159
0.5349 39.0 3510 2.0256
0.5433 40.0 3600 2.0228
0.5307 41.0 3690 2.0267
0.491 42.0 3780 2.0255
0.6112 43.0 3870 2.0256
0.5487 44.0 3960 2.0260
0.4791 45.0 4050 2.0265
0.3844 46.0 4140 2.0265
0.4253 47.0 4230 2.0265
0.4563 48.0 4320 2.0278
0.6504 49.0 4410 2.0290
0.4971 50.0 4500 2.0279

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
7
Unable to determine this model’s pipeline type. Check the docs .

Adapter for