zephyr-7b-ultra-p-0.02

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5105
Rewards/chosen: -0.5077
Rewards/rejected: -2.0215
Rewards/accuracies: 0.7344
Rewards/margins: 1.5138
Logps/rejected: -267.6940
Logps/chosen: -235.0849
Logits/rejected: -2.5635
Logits/chosen: -2.6281

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 1
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5874	0.1030	100	0.5573	-0.2314	-0.8600	0.6797	0.6286	-256.0786	-232.3214	-2.5790	-2.6440
0.5473	0.2060	200	0.5392	-0.5176	-1.5117	0.7031	0.9941	-262.5964	-235.1839	-2.5004	-2.5620
0.5436	0.3090	300	0.5426	-0.5160	-1.3998	0.6797	0.8838	-261.4774	-235.1676	-2.5184	-2.5855
0.5247	0.4120	400	0.5351	-0.6497	-2.2938	0.7422	1.6441	-270.4168	-236.5049	-2.5925	-2.6552
0.5168	0.5150	500	0.5223	-0.3770	-1.8625	0.6875	1.4854	-266.1038	-233.7780	-2.5839	-2.6438
0.5123	0.6180	600	0.5238	-0.3471	-1.8780	0.7344	1.5310	-266.2595	-233.4782	-2.5479	-2.6121
0.5266	0.7210	700	0.5222	-0.3742	-1.8507	0.7109	1.4765	-265.9860	-233.7501	-2.5703	-2.6311
0.5395	0.8240	800	0.5190	-0.4812	-2.0133	0.7266	1.5321	-267.6115	-234.8195	-2.5854	-2.6511
0.4871	0.9270	900	0.5121	-0.4722	-1.9692	0.7109	1.4969	-267.1705	-234.7301	-2.5606	-2.6256

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.20.0

tongliuphysics
/

zephyr-7b-ultra-p-0.02

zephyr-7b-ultra-p-0.02

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tongliuphysics/zephyr-7b-ultra-p-0.02

Evaluation results