default_all_prompts_v3
This model is a fine-tuned version of unsloth/mistral-7b-bnb-4bit on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3140
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.4199 | 0.04 | 50 | 0.4149 |
0.3159 | 0.09 | 100 | 0.3314 |
0.3066 | 0.13 | 150 | 0.3069 |
0.2802 | 0.17 | 200 | 0.2933 |
0.2604 | 0.21 | 250 | 0.2869 |
0.237 | 0.26 | 300 | 0.2854 |
0.2804 | 0.3 | 350 | 0.2775 |
0.2225 | 0.34 | 400 | 0.2778 |
0.264 | 0.39 | 450 | 0.2693 |
0.2279 | 0.43 | 500 | 0.2782 |
0.2669 | 0.47 | 550 | 0.2679 |
0.2381 | 0.51 | 600 | 0.2713 |
0.2737 | 0.56 | 650 | 0.2651 |
0.2473 | 0.6 | 700 | 0.2648 |
0.2605 | 0.64 | 750 | 0.2648 |
0.2255 | 0.69 | 800 | 0.2590 |
0.2628 | 0.73 | 850 | 0.2570 |
0.2386 | 0.77 | 900 | 0.2574 |
0.1703 | 0.81 | 950 | 0.2611 |
0.2333 | 0.86 | 1000 | 0.2535 |
0.2735 | 0.9 | 1050 | 0.2501 |
0.2215 | 0.94 | 1100 | 0.2568 |
0.2215 | 0.99 | 1150 | 0.2476 |
0.179 | 1.03 | 1200 | 0.2534 |
0.1855 | 1.07 | 1250 | 0.2542 |
0.1775 | 1.11 | 1300 | 0.2495 |
0.1556 | 1.16 | 1350 | 0.2545 |
0.1555 | 1.2 | 1400 | 0.2500 |
0.1745 | 1.24 | 1450 | 0.2507 |
0.1905 | 1.29 | 1500 | 0.2499 |
0.1968 | 1.33 | 1550 | 0.2451 |
0.1918 | 1.37 | 1600 | 0.2436 |
0.1812 | 1.41 | 1650 | 0.2405 |
0.1785 | 1.46 | 1700 | 0.2424 |
0.2008 | 1.5 | 1750 | 0.2477 |
0.1531 | 1.54 | 1800 | 0.2424 |
0.1648 | 1.59 | 1850 | 0.2413 |
0.1602 | 1.63 | 1900 | 0.2376 |
0.1752 | 1.67 | 1950 | 0.2430 |
0.1782 | 1.71 | 2000 | 0.2359 |
0.158 | 1.76 | 2050 | 0.2355 |
0.1632 | 1.8 | 2100 | 0.2336 |
0.172 | 1.84 | 2150 | 0.2341 |
0.1922 | 1.89 | 2200 | 0.2328 |
0.1469 | 1.93 | 2250 | 0.2317 |
0.1551 | 1.97 | 2300 | 0.2304 |
0.1063 | 2.01 | 2350 | 0.2559 |
0.0995 | 2.06 | 2400 | 0.2492 |
0.1059 | 2.1 | 2450 | 0.2491 |
0.078 | 2.14 | 2500 | 0.2497 |
0.0876 | 2.19 | 2550 | 0.2518 |
0.0775 | 2.23 | 2600 | 0.2477 |
0.1 | 2.27 | 2650 | 0.2534 |
0.0863 | 2.31 | 2700 | 0.2604 |
0.0884 | 2.36 | 2750 | 0.2523 |
0.0919 | 2.4 | 2800 | 0.2560 |
0.1004 | 2.44 | 2850 | 0.2450 |
0.1055 | 2.49 | 2900 | 0.2491 |
0.0787 | 2.53 | 2950 | 0.2509 |
0.1009 | 2.57 | 3000 | 0.2474 |
0.094 | 2.61 | 3050 | 0.2507 |
0.0877 | 2.66 | 3100 | 0.2508 |
0.0855 | 2.7 | 3150 | 0.2501 |
0.0808 | 2.74 | 3200 | 0.2500 |
0.096 | 2.78 | 3250 | 0.2449 |
0.0866 | 2.83 | 3300 | 0.2473 |
0.0842 | 2.87 | 3350 | 0.2468 |
0.0793 | 2.91 | 3400 | 0.2485 |
0.0861 | 2.96 | 3450 | 0.2470 |
0.0742 | 3.0 | 3500 | 0.2456 |
0.0338 | 3.04 | 3550 | 0.2945 |
0.0296 | 3.08 | 3600 | 0.3012 |
0.0449 | 3.13 | 3650 | 0.3028 |
0.0329 | 3.17 | 3700 | 0.3043 |
0.0288 | 3.21 | 3750 | 0.3021 |
0.0351 | 3.26 | 3800 | 0.3050 |
0.0413 | 3.3 | 3850 | 0.3058 |
0.0332 | 3.34 | 3900 | 0.3132 |
0.0291 | 3.38 | 3950 | 0.3166 |
0.0301 | 3.43 | 4000 | 0.3075 |
0.0442 | 3.47 | 4050 | 0.3105 |
0.043 | 3.51 | 4100 | 0.3042 |
0.0291 | 3.56 | 4150 | 0.3085 |
0.0249 | 3.6 | 4200 | 0.3081 |
0.0349 | 3.64 | 4250 | 0.3126 |
0.0303 | 3.68 | 4300 | 0.3124 |
0.0294 | 3.73 | 4350 | 0.3128 |
0.0253 | 3.77 | 4400 | 0.3129 |
0.0303 | 3.81 | 4450 | 0.3129 |
0.0295 | 3.86 | 4500 | 0.3137 |
0.0337 | 3.9 | 4550 | 0.3140 |
0.0354 | 3.94 | 4600 | 0.3141 |
0.0357 | 3.98 | 4650 | 0.3141 |
Framework versions
- PEFT 0.7.1
- Transformers 4.37.1
- Pytorch 2.1.2
- Datasets 2.16.1
- Tokenizers 0.15.1
- Downloads last month
- 1
Unable to determine this model’s pipeline type. Check the
docs
.