metadata
base_model: mistralai/Mixtral-8x7B-v0.1
datasets:
- generator
library_name: peft
license: apache-2.0
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: Mixtral_Alpace_v2
results: []
Mixtral_Alpace_v2
This model is a fine-tuned version of mistralai/Mixtral-8x7B-v0.1 on the generator dataset. It achieves the following results on the evaluation set:
- Loss: 0.5617
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2.5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 15
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.5577 | 0.0813 | 10 | 1.5534 |
1.4512 | 0.1626 | 20 | 1.4827 |
1.4106 | 0.2439 | 30 | 1.4104 |
1.3419 | 0.3252 | 40 | 1.3460 |
1.2361 | 0.4065 | 50 | 1.2827 |
1.2298 | 0.4878 | 60 | 1.2097 |
1.1468 | 0.5691 | 70 | 1.1400 |
1.0874 | 0.6504 | 80 | 1.0724 |
1.0372 | 0.7317 | 90 | 1.0088 |
0.9185 | 0.8130 | 100 | 0.9566 |
0.8927 | 0.8943 | 110 | 0.9139 |
0.8264 | 0.9756 | 120 | 0.8724 |
0.8799 | 1.0569 | 130 | 0.8329 |
0.8233 | 1.1382 | 140 | 0.7947 |
0.7761 | 1.2195 | 150 | 0.7633 |
0.7568 | 1.3008 | 160 | 0.7407 |
0.6957 | 1.3821 | 170 | 0.7224 |
0.6712 | 1.4634 | 180 | 0.7048 |
0.6738 | 1.5447 | 190 | 0.6908 |
0.7165 | 1.6260 | 200 | 0.6781 |
0.5913 | 1.7073 | 210 | 0.6673 |
0.6992 | 1.7886 | 220 | 0.6584 |
0.6438 | 1.8699 | 230 | 0.6497 |
0.6649 | 1.9512 | 240 | 0.6425 |
0.5907 | 2.0325 | 250 | 0.6358 |
0.6014 | 2.1138 | 260 | 0.6302 |
0.5605 | 2.1951 | 270 | 0.6250 |
0.5893 | 2.2764 | 280 | 0.6209 |
0.5761 | 2.3577 | 290 | 0.6166 |
0.6083 | 2.4390 | 300 | 0.6132 |
0.6404 | 2.5203 | 310 | 0.6100 |
0.5949 | 2.6016 | 320 | 0.6076 |
0.6208 | 2.6829 | 330 | 0.6047 |
0.6083 | 2.7642 | 340 | 0.6025 |
0.5922 | 2.8455 | 350 | 0.5998 |
0.6377 | 2.9268 | 360 | 0.5980 |
0.6059 | 3.0081 | 370 | 0.5960 |
0.6697 | 3.0894 | 380 | 0.5940 |
0.5813 | 3.1707 | 390 | 0.5925 |
0.5442 | 3.2520 | 400 | 0.5911 |
0.506 | 3.3333 | 410 | 0.5889 |
0.5806 | 3.4146 | 420 | 0.5878 |
0.5504 | 3.4959 | 430 | 0.5868 |
0.6051 | 3.5772 | 440 | 0.5849 |
0.5952 | 3.6585 | 450 | 0.5838 |
0.5128 | 3.7398 | 460 | 0.5825 |
0.5779 | 3.8211 | 470 | 0.5813 |
0.5448 | 3.9024 | 480 | 0.5802 |
0.5559 | 3.9837 | 490 | 0.5796 |
0.6136 | 4.0650 | 500 | 0.5787 |
0.5329 | 4.1463 | 510 | 0.5776 |
0.5267 | 4.2276 | 520 | 0.5767 |
0.5492 | 4.3089 | 530 | 0.5763 |
0.5206 | 4.3902 | 540 | 0.5758 |
0.5088 | 4.4715 | 550 | 0.5747 |
0.5811 | 4.5528 | 560 | 0.5739 |
0.5865 | 4.6341 | 570 | 0.5728 |
0.5563 | 4.7154 | 580 | 0.5729 |
0.5692 | 4.7967 | 590 | 0.5719 |
0.5827 | 4.8780 | 600 | 0.5713 |
0.5551 | 4.9593 | 610 | 0.5715 |
0.5059 | 5.0407 | 620 | 0.5708 |
0.5132 | 5.1220 | 630 | 0.5700 |
0.5314 | 5.2033 | 640 | 0.5698 |
0.5614 | 5.2846 | 650 | 0.5696 |
0.5489 | 5.3659 | 660 | 0.5688 |
0.5404 | 5.4472 | 670 | 0.5680 |
0.5745 | 5.5285 | 680 | 0.5672 |
0.5083 | 5.6098 | 690 | 0.5673 |
0.5565 | 5.6911 | 700 | 0.5670 |
0.5515 | 5.7724 | 710 | 0.5664 |
0.5448 | 5.8537 | 720 | 0.5664 |
0.5276 | 5.9350 | 730 | 0.5657 |
0.5436 | 6.0163 | 740 | 0.5656 |
0.5988 | 6.0976 | 750 | 0.5650 |
0.4929 | 6.1789 | 760 | 0.5652 |
0.5957 | 6.2602 | 770 | 0.5645 |
0.4968 | 6.3415 | 780 | 0.5645 |
0.4822 | 6.4228 | 790 | 0.5645 |
0.5527 | 6.5041 | 800 | 0.5642 |
0.5663 | 6.5854 | 810 | 0.5640 |
0.493 | 6.6667 | 820 | 0.5634 |
0.4992 | 6.7480 | 830 | 0.5630 |
0.5618 | 6.8293 | 840 | 0.5630 |
0.568 | 6.9106 | 850 | 0.5626 |
0.4869 | 6.9919 | 860 | 0.5626 |
0.5418 | 7.0732 | 870 | 0.5625 |
0.5364 | 7.1545 | 880 | 0.5621 |
0.5675 | 7.2358 | 890 | 0.5621 |
0.491 | 7.3171 | 900 | 0.5620 |
0.5555 | 7.3984 | 910 | 0.5621 |
0.6093 | 7.4797 | 920 | 0.5621 |
0.5529 | 7.5610 | 930 | 0.5620 |
0.5252 | 7.6423 | 940 | 0.5620 |
0.5024 | 7.7236 | 950 | 0.5620 |
0.5639 | 7.8049 | 960 | 0.5616 |
0.4676 | 7.8862 | 970 | 0.5618 |
0.5236 | 7.9675 | 980 | 0.5617 |
0.4902 | 8.0488 | 990 | 0.5616 |
0.486 | 8.1301 | 1000 | 0.5617 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1