vdaita commited on
Commit
c350b76
·
verified ·
1 Parent(s): 0dfec11

End of training

Browse files
Files changed (2) hide show
  1. README.md +160 -0
  2. adapter_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: peft
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ base_model: deepseek-ai/deepseek-coder-6.7b-instruct
8
+ model-index:
9
+ - name: diff-deepseek-ellipsis
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.0`
20
+ ```yaml
21
+ base_model: deepseek-ai/deepseek-coder-6.7b-instruct
22
+ model_type: LlamaForCausalLM
23
+ tokenizer_type: LlamaTokenizerFast
24
+
25
+ load_in_8bit: true
26
+ load_in_4bit: false
27
+ strict: false
28
+
29
+ datasets:
30
+ - path: vdaita/editpackft_inst_ellipsis
31
+ split: train
32
+ type: oasst
33
+ dataset_prepared_path:
34
+
35
+ test_datasets:
36
+ - path: vdaita/editpackft_inst_ellipsis
37
+ split: test
38
+ type: oasst
39
+
40
+ output_dir: ./outputs/dscoder-code-ellipsis
41
+
42
+ sequence_len: 4096
43
+ sample_packing: true
44
+ pad_to_sequence_len: true
45
+
46
+ eval_sample_packing: false
47
+
48
+ adapter: lora
49
+ lora_model_dir:
50
+ lora_r: 32
51
+ lora_alpha: 16
52
+ lora_dropout: 0.05
53
+ lora_target_linear: true
54
+ lora_fan_in_fan_out:
55
+
56
+ lora_modules_to_save:
57
+ - embed_tokens
58
+ - lm_head
59
+
60
+ wandb_project: huggingface
61
+ wandb_log_model: axolotl-dscoder-ellipsis
62
+
63
+ hub_model_id: vdaita/diff-deepseek-ellipsis
64
+ hub_strategy: every_save
65
+
66
+ gradient_accumulation_steps: 4
67
+ micro_batch_size: 2
68
+ num_epochs: 1
69
+ optimizer: adamw_bnb_8bit
70
+ lr_scheduler: cosine
71
+ learning_rate: 0.0002
72
+
73
+ train_on_inputs: false
74
+ group_by_length: false
75
+ bf16: auto
76
+ fp16:
77
+ tf32: false
78
+
79
+ gradient_checkpointing: true
80
+ early_stopping_patience:
81
+ resume_from_checkpoint:
82
+ local_rank:
83
+ logging_steps: 1
84
+ xformers_attention:
85
+ flash_attention: true
86
+ s2_attention:
87
+
88
+ warmup_steps: 10
89
+ evals_per_epoch: 4
90
+ saves_per_epoch: 1
91
+ debug:
92
+ deepspeed:
93
+ weight_decay: 0.0
94
+ fsdp:
95
+ fsdp_config:
96
+
97
+ special_tokens:
98
+ bos_token: "<|begin_of_sentence|>"
99
+ eos_token: "<|end_of_sentence|>"
100
+ pad_token: "<|end_of_sentence|>"
101
+
102
+ ```
103
+
104
+ </details><br>
105
+
106
+ # diff-deepseek-ellipsis
107
+
108
+ This model is a fine-tuned version of [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) on the None dataset.
109
+ It achieves the following results on the evaluation set:
110
+ - Loss: 0.0202
111
+
112
+ ## Model description
113
+
114
+ More information needed
115
+
116
+ ## Intended uses & limitations
117
+
118
+ More information needed
119
+
120
+ ## Training and evaluation data
121
+
122
+ More information needed
123
+
124
+ ## Training procedure
125
+
126
+ ### Training hyperparameters
127
+
128
+ The following hyperparameters were used during training:
129
+ - learning_rate: 0.0002
130
+ - train_batch_size: 2
131
+ - eval_batch_size: 2
132
+ - seed: 42
133
+ - distributed_type: multi-GPU
134
+ - num_devices: 2
135
+ - gradient_accumulation_steps: 4
136
+ - total_train_batch_size: 16
137
+ - total_eval_batch_size: 4
138
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
139
+ - lr_scheduler_type: cosine
140
+ - lr_scheduler_warmup_steps: 10
141
+ - num_epochs: 1
142
+
143
+ ### Training results
144
+
145
+ | Training Loss | Epoch | Step | Validation Loss |
146
+ |:-------------:|:-----:|:----:|:---------------:|
147
+ | 0.2054 | 0.02 | 1 | 0.2354 |
148
+ | 0.062 | 0.25 | 15 | 0.0651 |
149
+ | 0.0333 | 0.5 | 30 | 0.0370 |
150
+ | 0.0215 | 0.75 | 45 | 0.0218 |
151
+ | 0.0174 | 1.0 | 60 | 0.0202 |
152
+
153
+
154
+ ### Framework versions
155
+
156
+ - PEFT 0.10.0
157
+ - Transformers 4.40.0.dev0
158
+ - Pytorch 2.3.0+cu121
159
+ - Datasets 2.20.0
160
+ - Tokenizers 0.15.0
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93937c3dd5f3a3538a2062a256e4416162f68ea8decad2f081fd608f1ae1eb64
3
+ size 848460690