End of training

2db4660 verified about 2 months ago

14.5 kB

	---
	license: other
	base_model: deepseek-ai/deepseek-llm-7b-chat
	tags:
	- alignment-handbook
	- trl
	- dpo
	- generated_from_trainer
	- trl
	- dpo
	- generated_from_trainer
	datasets:
	- self-generate/ds_chat_original_cn_mining_oj_iter0-binarized
	- self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized
	- self-generate/ds_chat_original_cn_rl_oj_iter0-binarized
	model-index:
	- name: ds_chat_sppo_hard_iter0_2024-09-15-01.39
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://ml.byteintl.net/experiment/tracking/detail?Id=project_20240915_20321b8f&selectedTrial=run_20240915_fdcd3e5b)
	# ds_chat_sppo_hard_iter0_2024-09-15-01.39

	This model is a fine-tuned version of [deepseek-ai/deepseek-llm-7b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat) on the self-generate/ds_chat_original_cn_mining_oj_iter0-binarized, the self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized and the self-generate/ds_chat_original_cn_rl_oj_iter0-binarized datasets.
	It achieves the following results on the evaluation set:
	- Loss: 4624.1011
	- Rewards/chosen: 0.0051
	- Rewards/rejected: -0.0370
	- Rewards/accuracies: 0.5789
	- Rewards/margins: 0.0421
	- Logps/rejected: -263.3607
	- Logps/chosen: -252.4096
	- Logits/rejected: 1.4404
	- Logits/chosen: 1.3959
	- Debug/policy Chosen Logits: 1.3959
	- Debug/policy Rejected Logits: 1.4404
	- Debug/policy Chosen Logps: -252.4096
	- Debug/policy Rejected Logps: -263.3607
	- Debug/reference Chosen Logps: -252.9185
	- Debug/reference Rejected Logps: -259.6586
	- Debug/sppo Chosen Reward In Loss: 0.5089
	- Debug/sppo Rej Reward In Loss: -3.7021
	- Debug/sppo Chosen Loss: 2526.5620
	- Debug/sppo Reject Loss: 2309.3242

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-07
	- train_batch_size: 8
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 64
	- total_eval_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 8.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \| Debug/policy Chosen Logits \| Debug/policy Rejected Logits \| Debug/policy Chosen Logps \| Debug/policy Rejected Logps \| Debug/reference Chosen Logps \| Debug/reference Rejected Logps \| Debug/sppo Chosen Reward In Loss \| Debug/sppo Rej Reward In Loss \| Debug/sppo Chosen Loss \| Debug/sppo Reject Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|:--------------------------:\|:----------------------------:\|:-------------------------:\|:---------------------------:\|:----------------------------:\|:------------------------------:\|:--------------------------------:\|:-----------------------------:\|:----------------------:\|:----------------------:\|
	\| 4975.3273 \| 0.3623 \| 100 \| 4981.6489 \| -0.0033 \| -0.0038 \| 0.4605 \| 0.0004 \| -260.0373 \| -253.2532 \| 1.7010 \| 1.6372 \| 1.6372 \| 1.7010 \| -253.2532 \| -260.0373 \| -252.9185 \| -259.6586 \| -0.3347 \| -0.3786 \| 2534.3679 \| 2463.3860 \|
	\| 4930.2141 \| 0.7246 \| 200 \| 4924.0649 \| -0.0013 \| -0.0060 \| 0.5789 \| 0.0047 \| -260.2596 \| -253.0476 \| 1.6680 \| 1.6070 \| 1.6070 \| 1.6680 \| -253.0476 \| -260.2596 \| -252.9185 \| -259.6586 \| -0.1291 \| -0.6009 \| 2514.6309 \| 2444.3210 \|
	\| 4841.2859 \| 1.0870 \| 300 \| 4866.0864 \| -0.0095 \| -0.0185 \| 0.5395 \| 0.0089 \| -261.5047 \| -253.8716 \| 1.6500 \| 1.5926 \| 1.5926 \| 1.6500 \| -253.8716 \| -261.5047 \| -252.9185 \| -259.6586 \| -0.9531 \| -1.8460 \| 2603.5461 \| 2331.7520 \|
	\| 4822.266 \| 1.4493 \| 400 \| 4827.9761 \| -0.0173 \| -0.0295 \| 0.5395 \| 0.0122 \| -262.6080 \| -254.6497 \| 1.6162 \| 1.5603 \| 1.5603 \| 1.6162 \| -254.6497 \| -262.6080 \| -252.9185 \| -259.6586 \| -1.7313 \| -2.9494 \| 2692.5408 \| 2243.4092 \|
	\| 4715.0469 \| 1.8116 \| 500 \| 4771.2051 \| -0.0007 \| -0.0176 \| 0.4868 \| 0.0169 \| -261.4219 \| -252.9887 \| 1.5898 \| 1.5341 \| 1.5341 \| 1.5898 \| -252.9887 \| -261.4219 \| -252.9185 \| -259.6586 \| -0.0703 \| -1.7633 \| 2529.2981 \| 2376.3818 \|
	\| 4665.2648 \| 2.1739 \| 600 \| 4749.7798 \| 0.0008 \| -0.0212 \| 0.5395 \| 0.0220 \| -261.7789 \| -252.8382 \| 1.5688 \| 1.5147 \| 1.5147 \| 1.5688 \| -252.8382 \| -261.7789 \| -252.9185 \| -259.6586 \| 0.0803 \| -2.1202 \| 2515.5928 \| 2344.7095 \|
	\| 4625.0359 \| 2.5362 \| 700 \| 5035.4683 \| 0.0876 \| 0.0697 \| 0.6447 \| 0.0179 \| -252.6841 \| -244.1548 \| 1.5685 \| 1.5098 \| 1.5098 \| 1.5685 \| -244.1548 \| -252.6841 \| -252.9185 \| -259.6586 \| 8.7637 \| 6.9746 \| 1714.2816 \| 3259.7661 \|
	\| 4637.3375 \| 2.8986 \| 800 \| 4705.7749 \| -0.0031 \| -0.0319 \| 0.5921 \| 0.0287 \| -262.8461 \| -253.2311 \| 1.5294 \| 1.4773 \| 1.4773 \| 1.5294 \| -253.2311 \| -262.8461 \| -252.9185 \| -259.6586 \| -0.3127 \| -3.1874 \| 2569.7046 \| 2272.2061 \|
	\| 4550.082 \| 3.2609 \| 900 \| 4687.2900 \| -0.0001 \| -0.0318 \| 0.5921 \| 0.0317 \| -262.8345 \| -252.9287 \| 1.5160 \| 1.4652 \| 1.4652 \| 1.5160 \| -252.9287 \| -262.8345 \| -252.9185 \| -259.6586 \| -0.0102 \| -3.1759 \| 2544.3586 \| 2288.0042 \|
	\| 4612.343 \| 3.6232 \| 1000 \| 4670.3667 \| 0.0005 \| -0.0323 \| 0.5658 \| 0.0328 \| -262.8906 \| -252.8681 \| 1.5061 \| 1.4569 \| 1.4569 \| 1.5061 \| -252.8681 \| -262.8906 \| -252.9185 \| -259.6586 \| 0.0504 \| -3.2320 \| 2546.7378 \| 2296.4641 \|
	\| 4579.3098 \| 3.9855 \| 1100 \| 4676.5903 \| -0.0058 \| -0.0391 \| 0.5263 \| 0.0333 \| -263.5656 \| -253.4963 \| 1.5062 \| 1.4565 \| 1.4565 \| 1.5062 \| -253.4963 \| -263.5656 \| -252.9185 \| -259.6586 \| -0.5778 \| -3.9070 \| 2616.4526 \| 2253.1421 \|
	\| 4461.193 \| 4.3478 \| 1200 \| 4657.2646 \| 0.0038 \| -0.0339 \| 0.6053 \| 0.0377 \| -263.0466 \| -252.5387 \| 1.4919 \| 1.4449 \| 1.4449 \| 1.4919 \| -252.5387 \| -263.0466 \| -252.9185 \| -259.6586 \| 0.3798 \| -3.3879 \| 2517.6655 \| 2292.2590 \|
	\| 4688.9563 \| 4.7101 \| 1300 \| 4654.3955 \| -0.0002 \| -0.0373 \| 0.5658 \| 0.0371 \| -263.3885 \| -252.9360 \| 1.4725 \| 1.4244 \| 1.4244 \| 1.4725 \| -252.9360 \| -263.3885 \| -252.9185 \| -259.6586 \| -0.0175 \| -3.7298 \| 2567.2290 \| 2285.4812 \|
	\| 4572.3969 \| 5.0725 \| 1400 \| 4650.5352 \| -0.0014 \| -0.0398 \| 0.5789 \| 0.0384 \| -263.6363 \| -253.0607 \| 1.4663 \| 1.4206 \| 1.4206 \| 1.4663 \| -253.0607 \| -263.6363 \| -252.9185 \| -259.6586 \| -0.1422 \| -3.9776 \| 2580.2542 \| 2263.7637 \|
	\| 4497.8313 \| 5.4348 \| 1500 \| 4637.4077 \| 0.0039 \| -0.0371 \| 0.5658 \| 0.0410 \| -263.3676 \| -252.5313 \| 1.4566 \| 1.4118 \| 1.4118 \| 1.4566 \| -252.5313 \| -263.3676 \| -252.9185 \| -259.6586 \| 0.3872 \| -3.7090 \| 2528.2339 \| 2293.6980 \|
	\| 4573.9879 \| 5.7971 \| 1600 \| 4628.5752 \| 0.0069 \| -0.0333 \| 0.5921 \| 0.0402 \| -262.9847 \| -252.2267 \| 1.4558 \| 1.4099 \| 1.4099 \| 1.4558 \| -252.2267 \| -262.9847 \| -252.9185 \| -259.6586 \| 0.6917 \| -3.3261 \| 2501.1956 \| 2325.0657 \|
	\| 4493.7113 \| 6.1594 \| 1700 \| 4615.8252 \| 0.0106 \| -0.0325 \| 0.5921 \| 0.0431 \| -262.9095 \| -251.8597 \| 1.4488 \| 1.4028 \| 1.4028 \| 1.4488 \| -251.8597 \| -262.9095 \| -252.9185 \| -259.6586 \| 1.0587 \| -3.2509 \| 2467.5171 \| 2344.7961 \|
	\| 4579.916 \| 6.5217 \| 1800 \| 4618.2861 \| 0.0059 \| -0.0377 \| 0.5789 \| 0.0436 \| -263.4273 \| -252.3270 \| 1.4455 \| 1.4013 \| 1.4013 \| 1.4455 \| -252.3270 \| -263.4273 \| -252.9185 \| -259.6586 \| 0.5915 \| -3.7687 \| 2516.5059 \| 2301.5999 \|
	\| 4682.2398 \| 6.8841 \| 1900 \| 4613.9302 \| 0.0060 \| -0.0385 \| 0.6184 \| 0.0445 \| -263.5052 \| -252.3165 \| 1.4429 \| 1.3991 \| 1.3991 \| 1.4429 \| -252.3165 \| -263.5052 \| -252.9185 \| -259.6586 \| 0.6019 \| -3.8466 \| 2513.9785 \| 2293.4380 \|
	\| 4497.943 \| 7.2464 \| 2000 \| 4617.7402 \| 0.0049 \| -0.0368 \| 0.6053 \| 0.0417 \| -263.3337 \| -252.4285 \| 1.4409 \| 1.3966 \| 1.3966 \| 1.4409 \| -252.4285 \| -263.3337 \| -252.9185 \| -259.6586 \| 0.4900 \| -3.6751 \| 2527.1399 \| 2309.4104 \|
	\| 4470.4805 \| 7.6087 \| 2100 \| 4616.2676 \| 0.0083 \| -0.0372 \| 0.6053 \| 0.0455 \| -263.3792 \| -252.0898 \| 1.4419 \| 1.3983 \| 1.3983 \| 1.4419 \| -252.0898 \| -263.3792 \| -252.9185 \| -259.6586 \| 0.8286 \| -3.7205 \| 2493.6099 \| 2304.2241 \|
	\| 4514.8016 \| 7.9710 \| 2200 \| 4624.1011 \| 0.0051 \| -0.0370 \| 0.5789 \| 0.0421 \| -263.3607 \| -252.4096 \| 1.4404 \| 1.3959 \| 1.3959 \| 1.4404 \| -252.4096 \| -263.3607 \| -252.9185 \| -259.6586 \| 0.5089 \| -3.7021 \| 2526.5620 \| 2309.3242 \|


	### Framework versions

	- Transformers 4.42.0
	- Pytorch 2.3.0+cu121
	- Datasets 2.14.6
	- Tokenizers 0.19.1