mistral_7b_cosine_lr

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the generator dataset. It achieves the following results on the evaluation set:

Loss: 5.3993

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 3
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
lr_scheduler_warmup_steps: 15
num_epochs: 4

Training results

Training Loss	Epoch	Step	Validation Loss
11.1885	0.0549	10	61.4970
37.6512	0.1098	20	12.9405
14.576	0.1647	30	27.9852
9.5892	0.2196	40	6.4722
7.7639	0.2745	50	6.8158
6.3878	0.3294	60	6.3811
6.6118	0.3844	70	5.9281
6.006	0.4393	80	5.6753
6.1011	0.4942	90	5.8083
5.7396	0.5491	100	5.6193
5.5128	0.6040	110	5.4848
5.4599	0.6589	120	5.4267
5.5193	0.7138	130	5.4757
5.4488	0.7687	140	5.4422
5.4257	0.8236	150	5.3845
5.3938	0.8785	160	5.3727
5.3937	0.9334	170	5.3646
5.3916	0.9883	180	5.4825
5.4217	1.0432	190	5.3534
5.3915	1.0981	200	5.3497
5.3656	1.1531	210	5.3416
5.3718	1.2080	220	5.3691
5.3763	1.2629	230	5.4102
5.4039	1.3178	240	5.3993

Framework versions

PEFT 0.13.2
Transformers 4.45.2
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.20.0

zeeshan73
/

mistral_7b_cosine_lr

mistral_7b_cosine_lr

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for zeeshan73/mistral_7b_cosine_lr

Evaluation results