metadata

base_model: PKU-Alignment/alpaca-7b-reproduced
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - PKU-Alignment/PKU-SafeRLHF
model-index:
  - name: dpo-selective-alpaca
    results: []

dpo-selective-alpaca

This model is a fine-tuned version of PKU-Alignment/alpaca-7b-reproduced on the PKU-Alignment/PKU-SafeRLHF dataset. It achieves the following results on the evaluation set:

Loss: 4659.3857
Rewards/chosen: -0.2274
Rewards/rejected: -0.2645
Rewards/accuracies: 0.6342
Rewards/margins: 0.0372
Rewards/safe Rewards: -0.2254
Rewards/unsafe Rewards: -0.2253
Logps/rejected: -174.8009
Logps/chosen: -202.5513
Logits/rejected: -1.7296
Logits/chosen: -1.5835

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/safe Rewards	Rewards/unsafe Rewards	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
4842.2766	0.11	500	4952.8877	0.0166	0.0096	0.6573	0.0070	0.0166	0.0165	-147.3908	-178.1579	-1.7834	-1.6386
4764.3852	0.22	1000	4865.9209	-0.0099	-0.0282	0.6644	0.0184	-0.0094	-0.0098	-151.1701	-180.8021	-1.7281	-1.5780
4814.1586	0.32	1500	4783.4697	-0.1011	-0.1298	0.6566	0.0286	-0.1003	-0.1009	-161.3237	-189.9300	-1.7085	-1.5581
4693.2395	0.43	2000	4735.1978	-0.1597	-0.1926	0.6480	0.0329	-0.1583	-0.1588	-167.6019	-195.7835	-1.7080	-1.5598
4747.273	0.54	2500	4701.7651	-0.1978	-0.2321	0.6416	0.0344	-0.1960	-0.1962	-171.5614	-199.5948	-1.7166	-1.5693
4464.0027	0.65	3000	4681.6167	-0.2061	-0.2411	0.6356	0.0350	-0.2041	-0.2043	-172.4578	-200.4294	-1.7240	-1.5768
4613.8953	0.75	3500	4667.7300	-0.2201	-0.2561	0.6333	0.0360	-0.2182	-0.2182	-173.9565	-201.8304	-1.7289	-1.5822
4642.2859	0.86	4000	4661.8745	-0.2258	-0.2627	0.6336	0.0369	-0.2238	-0.2238	-174.6188	-202.3950	-1.7298	-1.5833
4747.2375	0.97	4500	4659.3687	-0.2266	-0.2638	0.6363	0.0372	-0.2246	-0.2245	-174.7243	-202.4745	-1.7302	-1.5838

Framework versions

Transformers 4.36.2
Pytorch 2.1.2
Datasets 2.14.6
Tokenizers 0.15.0