metadata

license: apache-2.0
tags:
  - generated_from_trainer
base_model: amazingvince/zephyr-smol_llama-100m-sft-full
model-index:
  - name: zephyr-smol_llama-100m-dpo-full
    results: []

zephyr-smol_llama-100m-dpo-full

This model is a fine-tuned version of amazingvince/zephyr-smol_llama-100m-sft-full on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5465
Rewards/chosen: -0.0518
Rewards/rejected: -0.7661
Rewards/accuracies: 0.7170
Rewards/margins: 0.7143
Logps/rejected: -450.2018
Logps/chosen: -588.7877
Logits/rejected: -4.9602
Logits/chosen: -5.2468

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6549	0.26	1000	0.6037	-0.1205	-0.4850	0.6550	0.3644	-447.3903	-589.4750	-4.7410	-5.0341
0.5349	0.52	2000	0.5779	-0.0126	-0.5080	0.6770	0.4955	-447.6208	-588.3951	-4.8645	-5.1463
0.6029	0.77	3000	0.5657	0.0902	-0.4636	0.6900	0.5538	-447.1767	-587.3674	-5.0016	-5.2911
0.5273	1.03	4000	0.5596	0.0496	-0.5449	0.7040	0.5944	-447.9891	-587.7738	-4.9972	-5.2892
0.5	1.29	5000	0.5557	0.0585	-0.6110	0.7050	0.6695	-448.6505	-587.6843	-5.0108	-5.3047
0.5056	1.55	6000	0.5499	0.0054	-0.6719	0.7130	0.6773	-449.2598	-588.2154	-4.9988	-5.2907
0.4608	1.81	7000	0.5500	-0.0376	-0.7494	0.7030	0.7118	-450.0341	-588.6455	-5.0549	-5.3406
0.426	2.07	8000	0.5472	-0.0106	-0.7021	0.7100	0.6916	-449.5617	-588.3751	-4.9750	-5.2626
0.3875	2.32	9000	0.5464	-0.0011	-0.7171	0.7140	0.7159	-449.7113	-588.2810	-4.9935	-5.2796
0.397	2.58	10000	0.5462	-0.0391	-0.7566	0.7190	0.7175	-450.1064	-588.6602	-4.9737	-5.2618
0.4486	2.84	11000	0.5459	-0.0493	-0.7667	0.7110	0.7174	-450.2074	-588.7629	-4.9569	-5.2441

Framework versions

Transformers 4.35.0
Pytorch 2.1.0
Datasets 2.14.6
Tokenizers 0.14.1

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	29.37
AI2 Reasoning Challenge (25-Shot)	25.00
HellaSwag (10-Shot)	28.54
MMLU (5-Shot)	25.18
TruthfulQA (0-shot)	45.75
Winogrande (5-shot)	51.07
GSM8k (5-shot)	0.68