zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

Logits/chosen: -2.2950
Logits/rejected: -2.1831
Logps/chosen: -268.8994
Logps/rejected: -246.9545
Loss: 1.3753
Rewards/accuracies: 0.6840
Rewards/chosen: 0.1114
Rewards/margins: 0.4929
Rewards/rejected: -0.3815

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
1.3628	0.0523	100	-2.3171	-2.2076	-268.5694	-245.9993	1.3708	0.6820	0.2269	0.2741	-0.0472
1.3948	0.1047	200	-2.3041	-2.1937	-268.7622	-246.5198	1.3925	0.6700	0.1594	0.3888	-0.2294
1.4105	0.1570	300	-2.3326	-2.2230	-269.4514	-247.3755	1.4104	0.6820	-0.0818	0.4471	-0.5289
1.4014	0.2094	400	-2.3264	-2.2167	-268.8318	-246.7196	1.4024	0.6760	0.1350	0.4344	-0.2993
1.4041	0.2617	500	-2.3064	-2.1950	-268.4164	-246.5134	1.4132	0.6800	0.2804	0.5076	-0.2271
1.419	0.3141	600	-2.3018	-2.1895	-269.1514	-246.9937	1.4088	0.6500	0.0232	0.4184	-0.3953
1.4382	0.3664	700	-2.2848	-2.1715	-269.7142	-247.6436	1.4137	0.6660	-0.1738	0.4489	-0.6227
1.4029	0.4187	800	-2.3170	-2.2078	-269.3091	-247.1983	1.4086	0.6640	-0.0320	0.4349	-0.4669
1.4076	0.4711	900	-2.2777	-2.1613	-269.2120	-247.1355	1.4028	0.6640	0.0020	0.4468	-0.4449
1.3823	0.5234	1000	-2.2891	-2.1756	-268.8081	-246.8032	1.3954	0.6520	0.1433	0.4719	-0.3286
1.3713	0.5758	1100	-2.2961	-2.1837	-269.3844	-247.4280	1.3982	0.6600	-0.0584	0.4889	-0.5473
1.3592	0.6281	1200	-2.2972	-2.1859	-269.0363	-247.0839	1.3881	0.6720	0.0634	0.4903	-0.4268
1.3859	0.6805	1300	-2.2892	-2.1763	-268.6349	-246.6918	1.3878	0.6780	0.2040	0.4936	-0.2896
1.3505	0.7328	1400	-2.2898	-2.1769	-268.8507	-247.0505	1.3823	0.6940	0.1284	0.5436	-0.4152
1.3499	0.7851	1500	-2.2921	-2.1798	-269.0495	-247.1410	1.3815	0.6920	0.0588	0.5056	-0.4468
1.3745	0.8375	1600	-2.2933	-2.1808	-268.8829	-246.9300	1.3764	0.7080	0.1172	0.4901	-0.3730
1.3744	0.8898	1700	-2.2950	-2.1831	-268.9738	-246.9943	1.3749	0.6760	0.0853	0.4808	-0.3955
1.3576	0.9422	1800	-2.2944	-2.1825	-268.9084	-246.9460	1.3785	0.6920	0.1082	0.4868	-0.3786
1.3778	0.9945	1900	-2.2950	-2.1831	-268.8994	-246.9545	1.3753	0.6840	0.1114	0.4929	-0.3815

Framework versions

PEFT 0.10.0
Transformers 4.43.1
Pytorch 2.1.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

Kimory-X
/

zephyr-7b-dpo-qlora

zephyr-7b-dpo-qlora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Kimory-X/zephyr-7b-dpo-qlora

Evaluation results