metadata

license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: zephyr-7b
    results: []

zephyr-7b

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.6906
Rewards/chosen: -0.3413
Rewards/rejected: -0.5652
Rewards/accuracies: 0.3631
Rewards/margins: 0.2239
Logps/rejected: -131.9189
Logps/chosen: -103.0295
Logits/rejected: -0.1381
Logits/chosen: -0.2453
Use Label: 15879.8574
Pred Label: 4192.1431

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Use Label	Pred Label
0.6818	0.1	100	0.6814	-0.0056	-0.0496	0.3393	0.0440	-80.3582	-69.4632	-2.0664	-2.0975	1833.4603	22.5397
0.6818	0.21	200	0.6861	-0.1358	-0.2381	0.3373	0.1023	-99.2068	-82.4782	-1.9938	-2.0215	3701.2063	258.7936
0.6848	0.31	300	0.6877	-0.2068	-0.3388	0.3413	0.1320	-109.2766	-89.5763	-1.8828	-1.9157	5437.8730	626.1270
0.6857	0.42	400	0.6885	-0.1802	-0.3299	0.3532	0.1497	-108.3913	-86.9237	-1.4031	-1.4529	7112.4443	1055.5555
0.6894	0.52	500	0.6892	-0.2862	-0.4559	0.3552	0.1697	-120.9922	-97.5203	-0.5997	-0.6889	8741.4287	1530.5714
0.6881	0.63	600	0.6918	-0.3826	-0.6059	0.3532	0.2233	-135.9845	-107.1618	-0.2548	-0.3579	10293.6826	2082.3174
0.6913	0.73	700	0.6899	-0.3542	-0.5787	0.3671	0.2244	-133.2637	-104.3247	-0.2462	-0.3470	11806.4766	2673.5239
0.6893	0.84	800	0.6904	-0.3443	-0.5684	0.3631	0.2241	-132.2416	-103.3355	-0.1293	-0.2367	13331.9043	3252.0952
0.689	0.94	900	0.6907	-0.3413	-0.5651	0.3631	0.2238	-131.9111	-103.0301	-0.1367	-0.2437	14866.4766	3821.5239

Framework versions

PEFT 0.7.1
Transformers 4.38.2
Pytorch 2.1.1+cu121
Datasets 2.14.6
Tokenizers 0.15.2