munish0838 commited on
Commit
bb528af
1 Parent(s): 5224ca3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +252 -0
README.md ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: cognitivecomputations/dolphin-2.9.1-yi-1.5-9b
4
+ tags:
5
+ - generated_from_trainer
6
+ - axolotl
7
+ datasets:
8
+ - cognitivecomputations/Dolphin-2.9
9
+ - teknium/OpenHermes-2.5
10
+ - m-a-p/CodeFeedback-Filtered-Instruction
11
+ - cognitivecomputations/dolphin-coder
12
+ - cognitivecomputations/samantha-data
13
+ - microsoft/orca-math-word-problems-200k
14
+ - Locutusque/function-calling-chatml
15
+ - internlm/Agent-FLAN
16
+ library_name: transformers
17
+ pipeline_tag: text-generation
18
+ ---
19
+
20
+ # Dolphin 2.9.1 Yi 1.5 9b 🐬-GGUF
21
+ This is quantized version of [cognitivecomputations/dolphin-2.9.1-yi-1.5-9b](https://huggingface.co/cognitivecomputations/dolphin-2.9.1-yi-1.5-9b) created using llama.cpp
22
+ # Model Description
23
+
24
+ Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations
25
+
26
+ This is our most spectacular outcome ever. FFT, all parameters, 16bit. 70.9 MMLU on 9b! And it talks like a dream.
27
+
28
+ Although the max positional embeddings is 4k, we used rope theta of 1000000.0 and we trained with sequence length 12k. We plan to train on the upcoming 32k version as well.
29
+
30
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations)
31
+
32
+
33
+ Our appreciation for the sponsors of Dolphin 2.9.1:
34
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xH100 node
35
+ - [OnDemand](https://on-demand.io/) - provided inference sponsorship
36
+
37
+ This model is based on Yi-1.5-9b, and is governed by apache 2.0 license.
38
+
39
+ The base model has 4k context, but we used rope theta of 1000000.0 and the full-weight fine-tuning was with 12k sequence length.
40
+
41
+ Dolphin 2.9.1 uses ChatML prompt template format.
42
+
43
+ example:
44
+
45
+ ```
46
+ <|im_start|>system
47
+ You are Dolphin, a helpful AI assistant.<|im_end|>
48
+ <|im_start|>user
49
+ {prompt}<|im_end|>
50
+ <|im_start|>assistant
51
+
52
+ ```
53
+
54
+ Dolphin-2.9.1 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
55
+
56
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
57
+
58
+ Dolphin is licensed according to apache 2.0 license. We grant permission for any use, including commercial. Dolphin was trained on data generated from GPT4, among other models.
59
+
60
+ ## Evals
61
+
62
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/tF9uD2W2yWODNdc--P68I.png)
63
+
64
+ ## Training
65
+
66
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
67
+ <details><summary>See axolotl config</summary>
68
+
69
+ axolotl version: `0.4.0`
70
+ ```yaml
71
+ base_model: 01-ai/Yi-1.5-9B
72
+ model_type: LlamaForCausalLM
73
+ tokenizer_type: LlamaTokenizer
74
+ trust_remote_code: true
75
+
76
+ # load_in_8bit: false
77
+ # load_in_4bit: true
78
+ # strict: false
79
+
80
+ # adapter: qlora
81
+ # lora_modules_to_save: [embed_tokens, lm_head]
82
+
83
+ # lora_r: 32
84
+ # lora_alpha: 16
85
+ # lora_dropout: 0.05
86
+ # lora_target_linear: True
87
+ # lora_fan_in_fan_out:
88
+
89
+ datasets:
90
+ - path: /workspace/datasets/dolphin-2.9/dolphin201-sharegpt2.jsonl
91
+ type: sharegpt
92
+ conversation: chatml
93
+ - path: /workspace/datasets/dolphin-2.9/dolphin-coder-translate-sharegpt2.jsonl
94
+ type: sharegpt
95
+ conversation: chatml
96
+ - path: /workspace/datasets/dolphin-2.9/dolphin-coder-codegen-sharegpt2.jsonl
97
+ type: sharegpt
98
+ conversation: chatml
99
+ - path: /workspace/datasets/dolphin-2.9/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
100
+ type: sharegpt
101
+ conversation: chatml
102
+ - path: /workspace/datasets/dolphin-2.9/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
103
+ type: sharegpt
104
+ conversation: chatml
105
+ - path: /workspace/datasets/dolphin-2.9/not_samantha_norefusals.jsonl
106
+ type: sharegpt
107
+ conversation: chatml
108
+ - path: /workspace/datasets/dolphin-2.9/Orca-Math-resort-unfiltered.jsonl
109
+ type: sharegpt
110
+ conversation: chatml
111
+ - path: /workspace/datasets/dolphin-2.9/agent_instruct_react_unfiltered.jsonl
112
+ type: sharegpt
113
+ conversation: chatml
114
+ - path: /workspace/datasets/dolphin-2.9/toolbench_instruct_j1s1_3k_unfiltered.jsonl
115
+ type: sharegpt
116
+ conversation: chatml
117
+ - path: /workspace/datasets/dolphin-2.9/toolbench_negative_unfiltered.jsonl
118
+ type: sharegpt
119
+ conversation: chatml
120
+ - path: /workspace/datasets/dolphin-2.9/toolbench_react_10p_unfiltered.jsonl
121
+ type: sharegpt
122
+ conversation: chatml
123
+ - path: /workspace/datasets/dolphin-2.9/toolbench_tflan_cot_30p_unfiltered.jsonl
124
+ type: sharegpt
125
+ conversation: chatml
126
+ - path: /workspace/datasets/dolphin-2.9/openhermes200k_unfiltered.jsonl
127
+ type: sharegpt
128
+ conversation: chatml
129
+
130
+ chat_template: chatml
131
+
132
+ dataset_prepared_path: yi34b
133
+ val_set_size: 0.03
134
+ output_dir: ./out-yi
135
+
136
+ sequence_len: 12000
137
+ sample_packing: true
138
+ pad_to_sequence_len: true
139
+
140
+ wandb_project: dolphin-2.9-yi-34b
141
+ wandb_watch:
142
+ wandb_run_id:
143
+ wandb_log_model:
144
+
145
+ gradient_accumulation_steps: 8
146
+ micro_batch_size: 2
147
+ num_epochs: 3
148
+ optimizer: adamw_8bit
149
+ lr_scheduler: cosine
150
+ learning_rate: 1e-5
151
+
152
+ train_on_inputs: false
153
+ group_by_length: false
154
+ bf16: auto
155
+ fp16:
156
+ tf32: true
157
+
158
+ gradient_checkpointing: true
159
+ gradient_checkpointing_kwargs:
160
+ use_reentrant: false
161
+ early_stopping_patience:
162
+ # resume_from_checkpoint: /workspace/axolotl/dbrx-checkpoint
163
+ logging_steps: 1
164
+ xformers_attention:
165
+ flash_attention: true
166
+
167
+ warmup_steps: 10
168
+ evals_per_epoch: 4
169
+ eval_table_size:
170
+ saves_per_epoch: 4
171
+ save_total_limit: 2
172
+ save_steps:
173
+ debug:
174
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
175
+ weight_decay: 0.05
176
+ fsdp:
177
+ fsdp_config:
178
+ special_tokens:
179
+ bos_token: "<|startoftext|>"
180
+ eos_token: "<|im_end|>"
181
+ pad_token: "<unk>"
182
+ unk_token: "<unk>"
183
+ tokens:
184
+ - "<|im_start|>"
185
+
186
+
187
+ ```
188
+
189
+ </details><br>
190
+
191
+ # out-yi
192
+
193
+ This model is a fine-tuned version of [01-ai/Yi-1.5-9B](https://huggingface.co/01-ai/Yi-1.5-9B) on the None dataset.
194
+ It achieves the following results on the evaluation set:
195
+ - Loss: 0.4396
196
+
197
+ ## Model description
198
+
199
+ More information needed
200
+
201
+ ## Intended uses & limitations
202
+
203
+ More information needed
204
+
205
+ ## Training and evaluation data
206
+
207
+ More information needed
208
+
209
+ ## Training procedure
210
+
211
+ ### Training hyperparameters
212
+
213
+ The following hyperparameters were used during training:
214
+ - learning_rate: 1e-05
215
+ - train_batch_size: 2
216
+ - eval_batch_size: 2
217
+ - seed: 42
218
+ - distributed_type: multi-GPU
219
+ - num_devices: 8
220
+ - gradient_accumulation_steps: 8
221
+ - total_train_batch_size: 128
222
+ - total_eval_batch_size: 16
223
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
224
+ - lr_scheduler_type: cosine
225
+ - lr_scheduler_warmup_steps: 10
226
+ - num_epochs: 3
227
+
228
+ ### Training results
229
+
230
+ | Training Loss | Epoch | Step | Validation Loss |
231
+ |:-------------:|:------:|:----:|:---------------:|
232
+ | 0.6332 | 0.0024 | 1 | 0.6469 |
233
+ | 0.4811 | 0.2499 | 106 | 0.4739 |
234
+ | 0.4465 | 0.4997 | 212 | 0.4547 |
235
+ | 0.4472 | 0.7496 | 318 | 0.4480 |
236
+ | 0.4373 | 0.9994 | 424 | 0.4429 |
237
+ | 0.4147 | 1.2384 | 530 | 0.4432 |
238
+ | 0.3879 | 1.4882 | 636 | 0.4400 |
239
+ | 0.3872 | 1.7381 | 742 | 0.4371 |
240
+ | 0.4044 | 1.9879 | 848 | 0.4344 |
241
+ | 0.3509 | 2.2269 | 954 | 0.4410 |
242
+ | 0.3628 | 2.4767 | 1060 | 0.4401 |
243
+ | 0.3652 | 2.7266 | 1166 | 0.4397 |
244
+ | 0.3674 | 2.9764 | 1272 | 0.4396 |
245
+
246
+
247
+ ### Framework versions
248
+
249
+ - Transformers 4.40.2
250
+ - Pytorch 2.2.2+cu121
251
+ - Datasets 2.15.0
252
+ - Tokenizers 0.19.1