metadata

license: mit
base_model: HuggingFaceH4/mistral-7b-sft-beta
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of HuggingFaceH4/mistral-7b-sft-beta on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0227
Rewards/chosen: -2.3113
Rewards/rejected: -2.8479
Rewards/accuracies: 0.6931
Rewards/margins: 0.5365
Logps/rejected: -435.4867
Logps/chosen: -375.3782
Logits/rejected: -1.4622
Logits/chosen: -1.5834
Debug/policy Weights: 0.0374
Debug/losses: 0.0212
Debug/raw Losses: 0.5682

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Debug/policy Weights	Debug/losses	Debug/raw Losses
0.1734	0.0796	100	0.1631	-0.1425	-0.1765	0.5924	0.0340	-168.3475	-158.4907	-2.7043	-2.7124	0.2381	0.1616	0.6787
0.0795	0.1592	200	0.0826	-0.7160	-0.9309	0.6483	0.2150	-243.7922	-215.8411	-2.4879	-2.4997	0.1266	0.0800	0.6296
0.0545	0.2388	300	0.0572	-1.0974	-1.4187	0.6642	0.3213	-292.5661	-253.9808	-2.4160	-2.4302	0.0894	0.0550	0.6166
0.0288	0.3183	400	0.0302	-1.9563	-2.3772	0.6698	0.4209	-388.4184	-339.8692	-2.2376	-2.2573	0.0477	0.0287	0.6044
0.0358	0.3979	500	0.0407	-1.7169	-2.1543	0.6698	0.4374	-366.1241	-315.9322	-2.2265	-2.2540	0.0659	0.0394	0.6064
0.0309	0.4775	600	0.0302	-1.9504	-2.4092	0.6660	0.4587	-391.6147	-339.2857	-2.0849	-2.1159	0.0489	0.0287	0.5899
0.0203	0.5571	700	0.0198	-2.3315	-2.7643	0.6856	0.4328	-427.1261	-377.3937	-1.6613	-1.7384	0.0317	0.0185	0.5808
0.0192	0.6367	800	0.0182	-2.5929	-3.1225	0.6866	0.5297	-462.9526	-403.5321	-1.0483	-1.2122	0.0290	0.0169	0.5789
0.0233	0.7163	900	0.0237	-2.3310	-2.8931	0.6810	0.5621	-440.0111	-377.3470	-1.3096	-1.4493	0.0387	0.0221	0.5726
0.0213	0.7959	1000	0.0219	-2.4229	-2.9606	0.6931	0.5377	-446.7564	-386.5316	-1.4880	-1.6049	0.0357	0.0203	0.5694
0.0229	0.8754	1100	0.0231	-2.2736	-2.7873	0.6950	0.5137	-429.4283	-371.6010	-1.5527	-1.6574	0.0379	0.0215	0.5695
0.0216	0.9550	1200	0.0227	-2.3113	-2.8479	0.6931	0.5365	-435.4867	-375.3782	-1.4622	-1.5834	0.0374	0.0212	0.5682

Framework versions

Transformers 4.41.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.19.1