metadata

license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: mistralit2_500_STEPS_1e8_rate_03_beta_DPO
    results: []

mistralit2_500_STEPS_1e8_rate_03_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6903
Rewards/chosen: -0.0048
Rewards/rejected: -0.0113
Rewards/accuracies: 0.5121
Rewards/margins: 0.0065
Logps/rejected: -28.6101
Logps/chosen: -23.4018
Logits/rejected: -2.8650
Logits/chosen: -2.8653

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-08
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 500

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6911	0.1	50	0.6909	0.0027	-0.0025	0.4967	0.0052	-28.5807	-23.3768	-2.8653	-2.8655
0.6916	0.2	100	0.6928	-0.0010	-0.0023	0.4571	0.0014	-28.5802	-23.3891	-2.8653	-2.8655
0.6931	0.29	150	0.6916	-0.0047	-0.0087	0.4659	0.0040	-28.6014	-23.4015	-2.8652	-2.8654
0.6922	0.39	200	0.6914	-0.0046	-0.0090	0.4681	0.0044	-28.6024	-23.4011	-2.8651	-2.8654
0.6921	0.49	250	0.6927	-0.0086	-0.0103	0.4747	0.0017	-28.6067	-23.4145	-2.8651	-2.8653
0.6938	0.59	300	0.6916	-0.0092	-0.0132	0.4835	0.0040	-28.6163	-23.4163	-2.8651	-2.8654
0.6976	0.68	350	0.6907	-0.0058	-0.0116	0.4747	0.0058	-28.6111	-23.4052	-2.8651	-2.8654
0.6918	0.78	400	0.6902	-0.0069	-0.0137	0.4967	0.0068	-28.6182	-23.4089	-2.8651	-2.8653
0.6862	0.88	450	0.6903	-0.0048	-0.0113	0.5121	0.0065	-28.6101	-23.4018	-2.8650	-2.8653
0.6946	0.98	500	0.6903	-0.0048	-0.0113	0.5121	0.0065	-28.6101	-23.4018	-2.8650	-2.8653

Framework versions

Transformers 4.38.2
Pytorch 2.0.0+cu117
Datasets 2.18.0
Tokenizers 0.15.2