khongtrunght
commited on
Commit
•
2a4552e
1
Parent(s):
72df182
Model save
Browse files- README.md +86 -0
- all_results.json +9 -0
- generation_config.json +14 -0
- train_results.json +9 -0
- trainer_state.json +0 -0
README.md
ADDED
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- trl
|
4 |
+
- dpo
|
5 |
+
- generated_from_trainer
|
6 |
+
model-index:
|
7 |
+
- name: Qwen2-7B-Instruct-SPPO-Function-call-v2.11
|
8 |
+
results: []
|
9 |
+
---
|
10 |
+
|
11 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
12 |
+
should probably proofread and complete it, then remove this comment. -->
|
13 |
+
|
14 |
+
# Qwen2-7B-Instruct-SPPO-Function-call-v2.11
|
15 |
+
|
16 |
+
This model was trained from scratch on the None dataset.
|
17 |
+
It achieves the following results on the evaluation set:
|
18 |
+
- Loss: 0.1457
|
19 |
+
- Rewards/chosen: -1.7639
|
20 |
+
- Rewards/rejected: -14.1509
|
21 |
+
- Rewards/accuracies: 0.9364
|
22 |
+
- Rewards/margins: 12.3871
|
23 |
+
- Logps/rejected: -551.2230
|
24 |
+
- Logps/chosen: -189.1563
|
25 |
+
- Logits/rejected: -1.6081
|
26 |
+
- Logits/chosen: -1.5770
|
27 |
+
|
28 |
+
## Model description
|
29 |
+
|
30 |
+
More information needed
|
31 |
+
|
32 |
+
## Intended uses & limitations
|
33 |
+
|
34 |
+
More information needed
|
35 |
+
|
36 |
+
## Training and evaluation data
|
37 |
+
|
38 |
+
More information needed
|
39 |
+
|
40 |
+
## Training procedure
|
41 |
+
|
42 |
+
### Training hyperparameters
|
43 |
+
|
44 |
+
The following hyperparameters were used during training:
|
45 |
+
- learning_rate: 5e-07
|
46 |
+
- train_batch_size: 2
|
47 |
+
- eval_batch_size: 2
|
48 |
+
- seed: 42
|
49 |
+
- distributed_type: multi-GPU
|
50 |
+
- num_devices: 8
|
51 |
+
- total_train_batch_size: 16
|
52 |
+
- total_eval_batch_size: 16
|
53 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
54 |
+
- lr_scheduler_type: cosine
|
55 |
+
- lr_scheduler_warmup_ratio: 0.1
|
56 |
+
- num_epochs: 2
|
57 |
+
|
58 |
+
### Training results
|
59 |
+
|
60 |
+
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
61 |
+
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
62 |
+
| 0.2001 | 0.1145 | 250 | 0.2192 | 0.7210 | -1.8684 | 0.9162 | 2.5895 | -305.5732 | -139.4582 | -1.6566 | -1.7096 |
|
63 |
+
| 0.1246 | 0.2290 | 500 | 0.1662 | 0.6780 | -4.7708 | 0.9277 | 5.4487 | -363.6193 | -140.3193 | -1.6309 | -1.6619 |
|
64 |
+
| 0.0831 | 0.3436 | 750 | 0.1441 | 0.5794 | -6.0728 | 0.9191 | 6.6521 | -389.6595 | -142.2913 | -1.6015 | -1.6194 |
|
65 |
+
| 0.0698 | 0.4581 | 1000 | 0.1458 | -0.1931 | -8.1002 | 0.9335 | 7.9071 | -430.2079 | -157.7405 | -1.6062 | -1.6142 |
|
66 |
+
| 0.0872 | 0.5726 | 1250 | 0.1416 | -0.0252 | -8.5014 | 0.9393 | 8.4762 | -438.2315 | -154.3822 | -1.5572 | -1.5535 |
|
67 |
+
| 0.0547 | 0.6871 | 1500 | 0.1330 | -0.4963 | -9.4547 | 0.9335 | 8.9584 | -457.2992 | -163.8050 | -1.5598 | -1.5574 |
|
68 |
+
| 0.1092 | 0.8016 | 1750 | 0.1337 | -1.2236 | -10.3660 | 0.9277 | 9.1424 | -475.5235 | -178.3509 | -1.5822 | -1.5827 |
|
69 |
+
| 0.1109 | 0.9162 | 2000 | 0.1190 | -0.4262 | -9.6091 | 0.9364 | 9.1829 | -460.3859 | -162.4036 | -1.5682 | -1.5631 |
|
70 |
+
| 0.013 | 1.0307 | 2250 | 0.1355 | -0.4415 | -10.4543 | 0.9393 | 10.0128 | -477.2908 | -162.7087 | -1.5520 | -1.5425 |
|
71 |
+
| 0.0107 | 1.1452 | 2500 | 0.1450 | -1.2114 | -11.9528 | 0.9393 | 10.7414 | -507.2599 | -178.1073 | -1.5666 | -1.5494 |
|
72 |
+
| 0.0203 | 1.2597 | 2750 | 0.1424 | -1.2291 | -12.7381 | 0.9364 | 11.5090 | -522.9661 | -178.4617 | -1.5798 | -1.5536 |
|
73 |
+
| 0.0128 | 1.3743 | 3000 | 0.1428 | -1.5064 | -13.4244 | 0.9393 | 11.9180 | -536.6923 | -184.0067 | -1.5982 | -1.5679 |
|
74 |
+
| 0.0447 | 1.4888 | 3250 | 0.1490 | -1.6333 | -13.8914 | 0.9422 | 12.2581 | -546.0324 | -186.5450 | -1.6084 | -1.5768 |
|
75 |
+
| 0.0114 | 1.6033 | 3500 | 0.1508 | -1.8097 | -14.2168 | 0.9393 | 12.4071 | -552.5399 | -190.0730 | -1.6144 | -1.5842 |
|
76 |
+
| 0.0201 | 1.7178 | 3750 | 0.1447 | -1.7474 | -14.1355 | 0.9393 | 12.3881 | -550.9136 | -188.8267 | -1.6087 | -1.5784 |
|
77 |
+
| 0.0139 | 1.8323 | 4000 | 0.1461 | -1.7396 | -14.1065 | 0.9393 | 12.3669 | -550.3343 | -188.6715 | -1.6088 | -1.5783 |
|
78 |
+
| 0.0038 | 1.9469 | 4250 | 0.1457 | -1.7639 | -14.1509 | 0.9364 | 12.3871 | -551.2230 | -189.1563 | -1.6081 | -1.5770 |
|
79 |
+
|
80 |
+
|
81 |
+
### Framework versions
|
82 |
+
|
83 |
+
- Transformers 4.44.0
|
84 |
+
- Pytorch 2.3.1+cu121
|
85 |
+
- Datasets 2.20.0
|
86 |
+
- Tokenizers 0.19.1
|
all_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 2.0,
|
3 |
+
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.09263221575655435,
|
5 |
+
"train_runtime": 18962.0322,
|
6 |
+
"train_samples": 34924,
|
7 |
+
"train_samples_per_second": 3.684,
|
8 |
+
"train_steps_per_second": 0.23
|
9 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token_id": 151643,
|
3 |
+
"do_sample": true,
|
4 |
+
"eos_token_id": [
|
5 |
+
151645,
|
6 |
+
151643
|
7 |
+
],
|
8 |
+
"pad_token_id": 151643,
|
9 |
+
"repetition_penalty": 1.05,
|
10 |
+
"temperature": 0.7,
|
11 |
+
"top_k": 20,
|
12 |
+
"top_p": 0.8,
|
13 |
+
"transformers_version": "4.44.0"
|
14 |
+
}
|
train_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 2.0,
|
3 |
+
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.09263221575655435,
|
5 |
+
"train_runtime": 18962.0322,
|
6 |
+
"train_samples": 34924,
|
7 |
+
"train_samples_per_second": 3.684,
|
8 |
+
"train_steps_per_second": 0.23
|
9 |
+
}
|
trainer_state.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|