Vistral_Function_Calling_500
This model is a fine-tuned version of Viet-Mistral/Vistral-7B-Chat on the generator dataset. It achieves the following results on the evaluation set:
- Loss: 0.2455
- Rouge1: 0.8798
- Rouge2: 0.7704
- Rougel: 0.8144
- Rougelsum: 0.873
- Gen Len: 2048.0
It achieves the following results on the test set:
- Loss: 0.2639
- Rouge1: 0.8874
- Rouge2: 0.7745
- Rougel: 0.8141
- Rougelsum: 0.8811
- Gen Len: 2048.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 3
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 6
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 2
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
0.8125 | 0.25 | 3 | 0.8199 | 0.8499 | 0.7054 | 0.7623 | 0.8429 | 2048.0 |
0.658 | 0.5 | 6 | 0.3952 | 0.8634 | 0.7361 | 0.7854 | 0.8564 | 2048.0 |
0.4082 | 0.75 | 9 | 0.3261 | 0.8732 | 0.7452 | 0.7927 | 0.8657 | 2048.0 |
0.3302 | 1.0 | 12 | 0.2928 | 0.8733 | 0.7552 | 0.801 | 0.8666 | 2048.0 |
0.2653 | 1.25 | 15 | 0.2653 | 0.8775 | 0.7646 | 0.809 | 0.8703 | 2048.0 |
0.2605 | 1.5 | 18 | 0.2528 | 0.8778 | 0.7678 | 0.8119 | 0.8707 | 2048.0 |
0.2444 | 1.75 | 21 | 0.2476 | 0.8793 | 0.7697 | 0.8132 | 0.872 | 2048.0 |
0.23 | 2.0 | 24 | 0.2455 | 0.8798 | 0.7704 | 0.8144 | 0.873 | 2048.0 |
Framework versions
- PEFT 0.11.1
- Transformers 4.41.1
- Pytorch 2.3.0+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 2
Model tree for Virros/Vistral_Function_Calling_500
Base model
Viet-Mistral/Vistral-7B-Chat