metadata
license: other
base_model: deepseek-ai/deepseek-llm-7b-chat
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- self-generate/ds_chat_original_cn_mining_oj_iter0-binarized
- self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized
- self-generate/ds_chat_original_cn_rl_oj_iter0-binarized
model-index:
- name: ds_chat_sigmoid_iter0_2024-09-14-21.15
results: []
ds_chat_sigmoid_iter0_2024-09-14-21.15
This model is a fine-tuned version of deepseek-ai/deepseek-llm-7b-chat on the self-generate/ds_chat_original_cn_mining_oj_iter0-binarized, the self-generate/ds_chat_original_cn_mining_sandbox_iter0-binarized and the self-generate/ds_chat_original_cn_rl_oj_iter0-binarized datasets. It achieves the following results on the evaluation set:
- Loss: 0.7009
- Rewards/chosen: 0.3500
- Rewards/rejected: 0.0298
- Rewards/accuracies: 0.3289
- Rewards/margins: 0.3202
- Logps/rejected: -63.8274
- Logps/chosen: -122.4480
- Logits/rejected: 1.6952
- Logits/chosen: 1.6350
- Debug/policy Chosen Logits: 1.6350
- Debug/policy Rejected Logits: 1.6952
- Debug/policy Chosen Logps: -122.4480
- Debug/policy Rejected Logps: -63.8274
- Debug/reference Chosen Logps: -123.1481
- Debug/reference Rejected Logps: -63.8871
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- lr_scheduler_warmup_steps: 100
- num_epochs: 8.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Debug/policy Chosen Logits | Debug/policy Rejected Logits | Debug/policy Chosen Logps | Debug/policy Rejected Logps | Debug/reference Chosen Logps | Debug/reference Rejected Logps |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.6965 | 0.3623 | 100 | 0.6848 | 0.1614 | 0.0731 | 0.2895 | 0.0882 | -63.7408 | -122.8253 | 1.7215 | 1.6604 | 1.6604 | 1.7215 | -122.8253 | -63.7408 | -123.1481 | -63.8871 |
0.7398 | 0.7246 | 200 | 0.7128 | 0.4980 | 0.1123 | 0.3289 | 0.3857 | -63.6625 | -122.1521 | 1.7105 | 1.6513 | 1.6513 | 1.7105 | -122.1521 | -63.6625 | -123.1481 | -63.8871 |
0.7007 | 1.0870 | 300 | 0.6869 | 0.4063 | -0.0006 | 0.3158 | 0.4070 | -63.8883 | -122.3354 | 1.7138 | 1.6542 | 1.6542 | 1.7138 | -122.3354 | -63.8883 | -123.1481 | -63.8871 |
0.7084 | 1.4493 | 400 | 0.7388 | 0.4329 | 0.1275 | 0.3026 | 0.3054 | -63.6320 | -122.2823 | 1.7009 | 1.6406 | 1.6406 | 1.7009 | -122.2823 | -63.6320 | -123.1481 | -63.8871 |
0.693 | 1.8116 | 500 | 0.6927 | 0.1909 | -0.0563 | 0.3158 | 0.2472 | -63.9997 | -122.7663 | 1.7035 | 1.6431 | 1.6431 | 1.7035 | -122.7663 | -63.9997 | -123.1481 | -63.8871 |
0.6683 | 2.1739 | 600 | 0.6755 | 0.2946 | 0.0203 | 0.3421 | 0.2744 | -63.8465 | -122.5588 | 1.7045 | 1.6442 | 1.6442 | 1.7045 | -122.5588 | -63.8465 | -123.1481 | -63.8871 |
0.7035 | 2.5362 | 700 | 0.6899 | 0.1404 | -0.0287 | 0.3158 | 0.1691 | -63.9445 | -122.8673 | 1.7058 | 1.6448 | 1.6448 | 1.7058 | -122.8673 | -63.9445 | -123.1481 | -63.8871 |
0.685 | 2.8986 | 800 | 0.6978 | 0.4321 | 0.0759 | 0.3947 | 0.3562 | -63.7352 | -122.2839 | 1.7109 | 1.6500 | 1.6500 | 1.7109 | -122.2839 | -63.7352 | -123.1481 | -63.8871 |
0.6585 | 3.2609 | 900 | 0.7158 | 0.4197 | 0.1341 | 0.2763 | 0.2856 | -63.6189 | -122.3087 | 1.7148 | 1.6527 | 1.6527 | 1.7148 | -122.3087 | -63.6189 | -123.1481 | -63.8871 |
0.6654 | 3.6232 | 1000 | 0.6837 | 0.4128 | 0.0010 | 0.3947 | 0.4118 | -63.8851 | -122.3225 | 1.7064 | 1.6460 | 1.6460 | 1.7064 | -122.3225 | -63.8851 | -123.1481 | -63.8871 |
0.669 | 3.9855 | 1100 | 0.6801 | 0.2662 | -0.0151 | 0.3816 | 0.2813 | -63.9173 | -122.6156 | 1.7008 | 1.6413 | 1.6413 | 1.7008 | -122.6156 | -63.9173 | -123.1481 | -63.8871 |
0.6658 | 4.3478 | 1200 | 0.6950 | 0.2165 | -0.0405 | 0.3553 | 0.2570 | -63.9680 | -122.7150 | 1.6985 | 1.6382 | 1.6382 | 1.6985 | -122.7150 | -63.9680 | -123.1481 | -63.8871 |
0.6774 | 4.7101 | 1300 | 0.6833 | 0.3216 | 0.0373 | 0.3289 | 0.2843 | -63.8124 | -122.5048 | 1.6956 | 1.6371 | 1.6371 | 1.6956 | -122.5048 | -63.8124 | -123.1481 | -63.8871 |
0.6553 | 5.0725 | 1400 | 0.6871 | 0.4489 | 0.0096 | 0.3421 | 0.4393 | -63.8679 | -122.2503 | 1.6926 | 1.6324 | 1.6324 | 1.6926 | -122.2503 | -63.8679 | -123.1481 | -63.8871 |
0.655 | 5.4348 | 1500 | 0.6900 | 0.3867 | 0.0004 | 0.3553 | 0.3863 | -63.8863 | -122.3746 | 1.7037 | 1.6446 | 1.6446 | 1.7037 | -122.3746 | -63.8863 | -123.1481 | -63.8871 |
0.6552 | 5.7971 | 1600 | 0.6981 | 0.2816 | -0.0683 | 0.3158 | 0.3498 | -64.0236 | -122.5849 | 1.6935 | 1.6342 | 1.6342 | 1.6935 | -122.5849 | -64.0236 | -123.1481 | -63.8871 |
0.6471 | 6.1594 | 1700 | 0.7017 | 0.3683 | 0.0204 | 0.3553 | 0.3479 | -63.8463 | -122.4115 | 1.6992 | 1.6385 | 1.6385 | 1.6992 | -122.4115 | -63.8463 | -123.1481 | -63.8871 |
0.6557 | 6.5217 | 1800 | 0.6957 | 0.2688 | -0.0975 | 0.3026 | 0.3663 | -64.0820 | -122.6105 | 1.6947 | 1.6337 | 1.6337 | 1.6947 | -122.6105 | -64.0820 | -123.1481 | -63.8871 |
0.6516 | 6.8841 | 1900 | 0.6872 | 0.3905 | 0.0084 | 0.3553 | 0.3821 | -63.8704 | -122.3671 | 1.7002 | 1.6400 | 1.6400 | 1.7002 | -122.3671 | -63.8704 | -123.1481 | -63.8871 |
0.6542 | 7.2464 | 2000 | 0.6910 | 0.3410 | 0.0003 | 0.3289 | 0.3406 | -63.8864 | -122.4661 | 1.6915 | 1.6320 | 1.6320 | 1.6915 | -122.4661 | -63.8864 | -123.1481 | -63.8871 |
0.6629 | 7.6087 | 2100 | 0.6930 | 0.4245 | 0.0306 | 0.3026 | 0.3939 | -63.8259 | -122.2991 | 1.6968 | 1.6376 | 1.6376 | 1.6968 | -122.2991 | -63.8259 | -123.1481 | -63.8871 |
0.6427 | 7.9710 | 2200 | 0.7009 | 0.3500 | 0.0298 | 0.3289 | 0.3202 | -63.8274 | -122.4480 | 1.6952 | 1.6350 | 1.6350 | 1.6952 | -122.4480 | -63.8274 | -123.1481 | -63.8871 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1