metadata

license: apache-2.0
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
model-index:
  - name: zephyr-7b-dpo-qlora
    results: []

zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4921
Rewards/chosen: -2.5098
Rewards/rejected: -3.5900
Rewards/accuracies: 0.7550
Rewards/margins: 1.0803
Logps/rejected: -600.2638
Logps/chosen: -516.2831
Logits/rejected: 2.5098
Logits/chosen: 2.2971

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6622	0.05	100	0.6637	0.0126	-0.0636	0.6840	0.0762	-247.6176	-264.0424	-2.2973	-2.3242
0.6069	0.1	200	0.6175	-0.5399	-0.8086	0.6720	0.2687	-322.1209	-319.2918	-1.9985	-2.0644
0.5858	0.16	300	0.5707	-0.8385	-1.3622	0.6930	0.5238	-377.4863	-349.1537	0.2196	0.1195
0.5518	0.21	400	0.5536	-0.8070	-1.4119	0.7230	0.6049	-382.4471	-346.0015	0.8423	0.7208
0.5953	0.26	500	0.5575	-0.6678	-1.1831	0.7110	0.5153	-359.5695	-332.0846	1.2558	1.0708
0.5032	0.31	600	0.5359	-1.3551	-2.1333	0.7310	0.7782	-454.5939	-400.8145	2.8427	2.7062
0.5741	0.37	700	0.5317	-1.2904	-2.0407	0.7260	0.7503	-445.3269	-394.3451	3.1371	2.9904
0.5318	0.42	800	0.5149	-1.6058	-2.4688	0.7450	0.8630	-488.1442	-425.8877	3.7140	3.5383
0.5353	0.47	900	0.5125	-2.5710	-3.5411	0.7460	0.9701	-595.3752	-522.4096	4.4179	4.2065
0.574	0.52	1000	0.5035	-2.6228	-3.6684	0.7370	1.0456	-608.1039	-527.5898	2.6517	2.4408
0.471	0.58	1100	0.5028	-2.6309	-3.7142	0.75	1.0833	-612.6806	-528.3990	2.2637	2.0694
0.4888	0.63	1200	0.4965	-2.4412	-3.4135	0.7530	0.9723	-582.6143	-509.4261	2.4042	2.2263
0.5204	0.68	1300	0.4941	-2.2701	-3.2940	0.7480	1.0239	-570.6591	-492.3148	2.2065	2.0121
0.5158	0.73	1400	0.4925	-2.6194	-3.7070	0.7540	1.0875	-611.9571	-527.2493	2.4817	2.2784
0.4677	0.79	1500	0.4922	-2.6220	-3.7128	0.7540	1.0908	-612.5421	-527.5074	2.5848	2.3739
0.5464	0.84	1600	0.4925	-2.5137	-3.5972	0.7510	1.0835	-600.9805	-516.6763	2.4955	2.2803
0.5078	0.89	1700	0.4920	-2.5031	-3.5840	0.7550	1.0809	-599.6627	-515.6122	2.5160	2.3031
0.4864	0.94	1800	0.4921	-2.5103	-3.5902	0.7550	1.0799	-600.2827	-516.3320	2.5115	2.2982
0.5211	0.99	1900	0.4921	-2.5098	-3.5900	0.7550	1.0803	-600.2638	-516.2831	2.5098	2.2971

Framework versions

PEFT 0.7.1
Transformers 4.38.2
Pytorch 2.1.2
Datasets 2.14.6
Tokenizers 0.15.2