wzhouad commited on
Commit
9ac41f7
1 Parent(s): 886c1e8

Model save

Browse files
README.md CHANGED
@@ -13,20 +13,19 @@ model-index:
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sanqiang/wdpo/runs/dnn9mazg)
17
  # zephyr-7b-dpo-full
18
 
19
  This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.5417
22
- - Rewards/chosen: -2.1562
23
- - Rewards/rejected: -2.8807
24
- - Rewards/accuracies: 0.7313
25
- - Rewards/margins: 0.7245
26
- - Logps/rejected: -438.7701
27
- - Logps/chosen: -359.8675
28
- - Logits/rejected: 0.5902
29
- - Logits/chosen: 0.3561
30
 
31
  ## Model description
32
 
@@ -48,7 +47,7 @@ The following hyperparameters were used during training:
48
  - learning_rate: 5e-07
49
  - train_batch_size: 8
50
  - eval_batch_size: 8
51
- - seed: 42
52
  - distributed_type: multi-GPU
53
  - num_devices: 8
54
  - gradient_accumulation_steps: 2
@@ -61,25 +60,17 @@ The following hyperparameters were used during training:
61
 
62
  ### Training results
63
 
64
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
65
- |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
66
- | 0.679 | 0.0796 | 100 | 0.6759 | -0.1436 | -0.1818 | 0.5998 | 0.0382 | -168.8750 | -158.6036 | -2.6862 | -2.6943 |
67
- | 0.5947 | 0.1592 | 200 | 0.6027 | -1.5133 | -2.0123 | 0.6679 | 0.4990 | -351.9330 | -295.5727 | -1.6083 | -1.6620 |
68
- | 0.578 | 0.2388 | 300 | 0.5751 | -1.2683 | -1.7143 | 0.6894 | 0.4460 | -322.1284 | -271.0768 | -1.3925 | -1.5128 |
69
- | 0.5575 | 0.3183 | 400 | 0.5613 | -1.7874 | -2.4481 | 0.7052 | 0.6607 | -395.5074 | -322.9848 | -0.2511 | -0.4263 |
70
- | 0.5311 | 0.3979 | 500 | 0.5601 | -2.0743 | -2.7782 | 0.7248 | 0.7039 | -428.5196 | -351.6741 | 0.1321 | -0.1444 |
71
- | 0.5658 | 0.4775 | 600 | 0.5562 | -1.9576 | -2.6629 | 0.7192 | 0.7053 | -416.9899 | -340.0069 | 0.9125 | 0.6661 |
72
- | 0.556 | 0.5571 | 700 | 0.5502 | -2.1146 | -2.7825 | 0.7201 | 0.6678 | -428.9443 | -355.7084 | 0.9969 | 0.7302 |
73
- | 0.5285 | 0.6367 | 800 | 0.5477 | -2.1980 | -2.9456 | 0.7229 | 0.7476 | -445.2567 | -364.0405 | 0.8564 | 0.6029 |
74
- | 0.5299 | 0.7163 | 900 | 0.5450 | -2.1121 | -2.8512 | 0.7341 | 0.7391 | -435.8159 | -355.4508 | 0.9832 | 0.7089 |
75
- | 0.5629 | 0.7959 | 1000 | 0.5440 | -2.1483 | -2.8941 | 0.7323 | 0.7457 | -440.1051 | -359.0749 | 0.7033 | 0.4600 |
76
- | 0.5351 | 0.8754 | 1100 | 0.5423 | -2.1496 | -2.8571 | 0.7304 | 0.7074 | -436.4062 | -359.2066 | 0.5029 | 0.2753 |
77
- | 0.5499 | 0.9550 | 1200 | 0.5417 | -2.1562 | -2.8807 | 0.7313 | 0.7245 | -438.7701 | -359.8675 | 0.5902 | 0.3561 |
78
 
79
 
80
  ### Framework versions
81
 
82
- - Transformers 4.41.0.dev0
83
  - Pytorch 2.1.2+cu121
84
  - Datasets 2.14.6
85
- - Tokenizers 0.19.1
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
 
16
  # zephyr-7b-dpo-full
17
 
18
  This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.0632
21
+ - Rewards/chosen: -1.3406
22
+ - Rewards/rejected: -2.3147
23
+ - Rewards/accuracies: 0.7734
24
+ - Rewards/margins: 0.9740
25
+ - Logps/rejected: -488.8222
26
+ - Logps/chosen: -391.1042
27
+ - Logits/rejected: -2.0084
28
+ - Logits/chosen: -2.0472
29
 
30
  ## Model description
31
 
 
47
  - learning_rate: 5e-07
48
  - train_batch_size: 8
49
  - eval_batch_size: 8
50
+ - seed: 1
51
  - distributed_type: multi-GPU
52
  - num_devices: 8
53
  - gradient_accumulation_steps: 2
 
60
 
61
  ### Training results
62
 
63
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.0717 | 0.21 | 100 | 0.0733 | -0.5920 | -1.1062 | 0.7266 | 0.5142 | -367.9753 | -316.2405 | -2.7130 | -2.7290 |
66
+ | 0.0672 | 0.42 | 200 | 0.0662 | -0.9311 | -1.7199 | 0.7422 | 0.7888 | -429.3445 | -350.1472 | -2.2044 | -2.2377 |
67
+ | 0.0648 | 0.63 | 300 | 0.0643 | -1.2563 | -2.1377 | 0.7734 | 0.8814 | -471.1217 | -382.6705 | -2.0727 | -2.1098 |
68
+ | 0.0636 | 0.84 | 400 | 0.0632 | -1.3406 | -2.3147 | 0.7734 | 0.9740 | -488.8222 | -391.1042 | -2.0084 | -2.0472 |
 
 
 
 
 
 
 
 
69
 
70
 
71
  ### Framework versions
72
 
73
+ - Transformers 4.35.2
74
  - Pytorch 2.1.2+cu121
75
  - Datasets 2.14.6
76
+ - Tokenizers 0.14.1
all_results.json CHANGED
@@ -1,9 +1,8 @@
1
  {
2
- "epoch": 0.9996020692399522,
3
- "total_flos": 0.0,
4
- "train_loss": 0.56636344817034,
5
- "train_runtime": 10031.2749,
6
- "train_samples": 160800,
7
- "train_samples_per_second": 16.03,
8
- "train_steps_per_second": 0.125
9
  }
 
1
  {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.06801126423490596,
4
+ "train_runtime": 3957.1126,
5
+ "train_samples": 61134,
6
+ "train_samples_per_second": 15.449,
7
+ "train_steps_per_second": 0.121
 
8
  }
config.json CHANGED
@@ -3,7 +3,6 @@
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
6
- "attention_dropout": 0.0,
7
  "bos_token_id": 1,
8
  "eos_token_id": 2,
9
  "hidden_act": "silu",
@@ -20,7 +19,7 @@
20
  "sliding_window": 4096,
21
  "tie_word_embeddings": false,
22
  "torch_dtype": "bfloat16",
23
- "transformers_version": "4.41.0.dev0",
24
  "use_cache": false,
25
  "vocab_size": 32000
26
  }
 
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
 
6
  "bos_token_id": 1,
7
  "eos_token_id": 2,
8
  "hidden_act": "silu",
 
19
  "sliding_window": 4096,
20
  "tie_word_embeddings": false,
21
  "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.35.2",
23
  "use_cache": false,
24
  "vocab_size": 32000
25
  }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
- "transformers_version": "4.41.0.dev0"
6
  }
 
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
+ "transformers_version": "4.35.2"
6
  }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:98f00987598688fdebf9936701ec965959200df36d355e58d759228a95bd1106
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35fc90854c79d5ffff8b99aeab6983babdeedf331e787883bdae0dad492b8a21
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:69882d48888219846a5788fd8b94d1e6391d766b051bcaf33889cd3e7e8ce63f
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5925362d3801dfbc04556ebaacf5fd331d68765762d80ac3fec9c141f4afdc4
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:88597e6e627267d3070da8b8d6010bbdf8fdee4bfd6d7c44ef9daa98a75f8dc9
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4be9c60eb07630e94cc8a646e8a301810416b15cc521bb510470dfae40f284b3
3
  size 4540516344
tokenizer.json CHANGED
@@ -134,7 +134,6 @@
134
  "end_of_word_suffix": null,
135
  "fuse_unk": true,
136
  "byte_fallback": true,
137
- "ignore_merges": false,
138
  "vocab": {
139
  "<unk>": 0,
140
  "<s>": 1,
 
134
  "end_of_word_suffix": null,
135
  "fuse_unk": true,
136
  "byte_fallback": true,
 
137
  "vocab": {
138
  "<unk>": 0,
139
  "<s>": 1,
tokenizer_config.json CHANGED
@@ -1,6 +1,4 @@
1
  {
2
- "add_bos_token": true,
3
- "add_eos_token": false,
4
  "added_tokens_decoder": {
5
  "0": {
6
  "content": "<unk>",
@@ -36,6 +34,7 @@
36
  "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
37
  "clean_up_tokenization_spaces": false,
38
  "eos_token": "</s>",
 
39
  "model_max_length": 2048,
40
  "pad_token": "</s>",
41
  "sp_model_kwargs": {},
 
1
  {
 
 
2
  "added_tokens_decoder": {
3
  "0": {
4
  "content": "<unk>",
 
34
  "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
35
  "clean_up_tokenization_spaces": false,
36
  "eos_token": "</s>",
37
+ "legacy": true,
38
  "model_max_length": 2048,
39
  "pad_token": "</s>",
40
  "sp_model_kwargs": {},
train_results.json CHANGED
@@ -1,9 +1,8 @@
1
  {
2
- "epoch": 0.9996020692399522,
3
- "total_flos": 0.0,
4
- "train_loss": 0.56636344817034,
5
- "train_runtime": 10031.2749,
6
- "train_samples": 160800,
7
- "train_samples_per_second": 16.03,
8
- "train_steps_per_second": 0.125
9
  }
 
1
  {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.06801126423490596,
4
+ "train_runtime": 3957.1126,
5
+ "train_samples": 61134,
6
+ "train_samples_per_second": 15.449,
7
+ "train_steps_per_second": 0.121
 
8
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff
 
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0f0eec8b45f1d0b1936e6c6e77002d70f8abb680c5ffcd12326f72d2e60dc1d0
3
- size 6456
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51d58bbae0c984db37a6e2fc40cc3202c5c85c5f6e2c92d3d9d3ec94f7f79a3d
3
+ size 5944