Mistral2_150_STEPS_03beta_1e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5695
Rewards/chosen: 0.0666
Rewards/rejected: -0.3152
Rewards/accuracies: 0.6659
Rewards/margins: 0.3818
Logps/rejected: -27.6074
Logps/chosen: -23.4495
Logits/rejected: -2.2642
Logits/chosen: -2.2637

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 150

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6838	0.0977	50	0.6757	0.0819	0.0446	0.6088	0.0374	-26.4083	-23.3987	-2.3067	-2.3063
0.5869	0.1953	100	0.5936	0.1585	-0.1222	0.6418	0.2808	-26.9643	-23.1432	-2.2711	-2.2707
0.5203	0.2930	150	0.5695	0.0666	-0.3152	0.6659	0.3818	-27.6074	-23.4495	-2.2642	-2.2637

Framework versions

Transformers 4.40.1
Pytorch 2.0.0+cu117
Datasets 2.19.1
Tokenizers 0.19.1

tsavage68
/

Mistral2_150_STEPS_03beta_1e7rate_CDPOSFT

Mistral2_150_STEPS_03beta_1e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Evaluation results

Mistral2_150_STEPS_03beta_1e7rate_CDPOSFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from tsavage68/mistralit2_1000_STEPS_5e7_SFT

Evaluation results

Finetuned from