vdaita commited on
Commit
6ef3938
1 Parent(s): 6750904

End of training

Browse files
Files changed (2) hide show
  1. README.md +29 -10
  2. adapter_model.bin +2 -2
README.md CHANGED
@@ -2,10 +2,11 @@
2
  license: other
3
  library_name: peft
4
  tags:
 
5
  - generated_from_trainer
6
  base_model: deepseek-ai/deepseek-coder-6.7b-instruct
7
  model-index:
8
- - name: outputs/dscoder-code-ir-2
9
  results: []
10
  ---
11
 
@@ -27,10 +28,16 @@ strict: false
27
 
28
  datasets:
29
  - path: vdaita/editpackft_inst_code
 
30
  type: oasst
31
  dataset_prepared_path:
32
- val_set_size: 0.05
33
- output_dir: ./outputs/dscoder-code-ir-2
 
 
 
 
 
34
 
35
  sequence_len: 4096
36
  sample_packing: true
@@ -46,8 +53,15 @@ lora_dropout: 0.05
46
  lora_target_linear: true
47
  lora_fan_in_fan_out:
48
 
 
 
 
 
49
  wandb_project: huggingface
50
- wandb_log_model: axolotl-dscoder-code-2
 
 
 
51
 
52
  gradient_accumulation_steps: 4
53
  micro_batch_size: 2
@@ -80,15 +94,20 @@ weight_decay: 0.0
80
  fsdp:
81
  fsdp_config:
82
 
 
 
 
 
 
83
  ```
84
 
85
  </details><br>
86
 
87
- # outputs/dscoder-code-ir-2
88
 
89
  This model is a fine-tuned version of [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) on the None dataset.
90
  It achieves the following results on the evaluation set:
91
- - Loss: 0.2242
92
 
93
  ## Model description
94
 
@@ -125,10 +144,10 @@ The following hyperparameters were used during training:
125
 
126
  | Training Loss | Epoch | Step | Validation Loss |
127
  |:-------------:|:-----:|:----:|:---------------:|
128
- | 0.6668 | 0.03 | 1 | 0.7461 |
129
- | 0.5084 | 0.26 | 10 | 0.4586 |
130
- | 0.241 | 0.53 | 20 | 0.2486 |
131
- | 0.2553 | 0.79 | 30 | 0.2242 |
132
 
133
 
134
  ### Framework versions
 
2
  license: other
3
  library_name: peft
4
  tags:
5
+ - axolotl
6
  - generated_from_trainer
7
  base_model: deepseek-ai/deepseek-coder-6.7b-instruct
8
  model-index:
9
+ - name: diff-deepseek-code-ir
10
  results: []
11
  ---
12
 
 
28
 
29
  datasets:
30
  - path: vdaita/editpackft_inst_code
31
+ split: train
32
  type: oasst
33
  dataset_prepared_path:
34
+
35
+ test_datasets:
36
+ - path: vdaita/editpackft_inst_code
37
+ split: test
38
+ type: oasst
39
+
40
+ output_dir: ./outputs/dscoder-code-ir-3
41
 
42
  sequence_len: 4096
43
  sample_packing: true
 
53
  lora_target_linear: true
54
  lora_fan_in_fan_out:
55
 
56
+ lora_modules_to_save:
57
+ - embed_tokens
58
+ - lm_head
59
+
60
  wandb_project: huggingface
61
+ wandb_log_model: axolotl-dscoder-code-3
62
+
63
+ hub_model_id: vdaita/diff-deepseek-code-ir
64
+ hub_strategy: every_save
65
 
66
  gradient_accumulation_steps: 4
67
  micro_batch_size: 2
 
94
  fsdp:
95
  fsdp_config:
96
 
97
+ special_tokens:
98
+ bos_token: "<|begin_of_sentence|>"
99
+ eos_token: "<|end_of_sentence|>"
100
+ pad_token: "<|end_of_sentence|>"
101
+
102
  ```
103
 
104
  </details><br>
105
 
106
+ # diff-deepseek-code-ir
107
 
108
  This model is a fine-tuned version of [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) on the None dataset.
109
  It achieves the following results on the evaluation set:
110
+ - Loss: 0.2677
111
 
112
  ## Model description
113
 
 
144
 
145
  | Training Loss | Epoch | Step | Validation Loss |
146
  |:-------------:|:-----:|:----:|:---------------:|
147
+ | 0.6921 | 0.03 | 1 | 0.7832 |
148
+ | 0.5453 | 0.25 | 10 | 0.5221 |
149
+ | 0.3129 | 0.51 | 20 | 0.2985 |
150
+ | 0.2527 | 0.76 | 30 | 0.2677 |
151
 
152
 
153
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:15b45762ee6c593a22c5eb0196cd42325259a42e5d2a3040326db83e66f6db12
3
- size 319977674
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a374dcebcc148069ae22aec3b398b504514f6fb5b9980a3a763e7f69b983b02
3
+ size 848460690