metadata

base_model: HuggingFaceH4/starcoder2-15b-ift
tags:
  - alignment-handbook
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
  - HuggingFaceH4/orca_dpo_pairs
model-index:
  - name: starcoder2-15b-dpo-v4.0
    results: []

starcoder2-15b-dpo-v4.0

This model is a fine-tuned version of HuggingFaceH4/starcoder2-15b-ift on the HuggingFaceH4/ultrafeedback_binarized and the HuggingFaceH4/orca_dpo_pairs datasets. It achieves the following results on the evaluation set:

Loss: 0.4347
Rewards/chosen: -0.9461
Rewards/rejected: -2.7745
Rewards/accuracies: 0.7658
Rewards/margins: 1.8284
Logps/rejected: -322.1934
Logps/chosen: -316.1898
Logits/rejected: -2.3817
Logits/chosen: -2.3005

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.717	0.17	100	0.6006	-0.0924	-0.2899	0.6329	0.1975	-272.5022	-299.1165	-2.5313	-2.4191
0.6273	0.35	200	0.5160	-0.3994	-0.9461	0.6930	0.5467	-285.6261	-305.2568	-2.5281	-2.4278
0.5538	0.52	300	0.4781	-0.6589	-1.5892	0.7247	0.9302	-298.4870	-310.4470	-2.4996	-2.4110
0.5056	0.7	400	0.4594	-0.8283	-2.1332	0.7437	1.3050	-309.3687	-313.8344	-2.4472	-2.3644
0.4983	0.87	500	0.4512	-0.7758	-2.2806	0.7468	1.5049	-312.3167	-312.7843	-2.4223	-2.3404
0.4662	1.04	600	0.4431	-0.7839	-2.4016	0.7658	1.6177	-314.7355	-312.9465	-2.4049	-2.3215
0.4411	1.22	700	0.4415	-1.0090	-2.7582	0.7690	1.7492	-321.8679	-317.4481	-2.3840	-2.3016
0.471	1.39	800	0.4368	-0.9617	-2.7445	0.7690	1.7828	-321.5930	-316.5019	-2.3809	-2.2991
0.4485	1.57	900	0.4351	-0.9490	-2.7594	0.7722	1.8103	-321.8916	-316.2497	-2.3815	-2.3004
0.4411	1.74	1000	0.4348	-0.9293	-2.7469	0.7658	1.8176	-321.6409	-315.8547	-2.3823	-2.3011
0.4499	1.92	1100	0.4348	-0.9482	-2.7767	0.7658	1.8285	-322.2369	-316.2320	-2.3828	-2.3012

Framework versions

Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.16.1
Tokenizers 0.15.1