TheBloke commited on
Commit
cc4ce2c
1 Parent(s): bd89b78

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +574 -0
README.md ADDED
@@ -0,0 +1,574 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10
3
+ datasets:
4
+ - rombodawg/LosslessMegaCodeTrainingV2_1m_Evol_Uncensored
5
+ - OpenAssistant/oasst1
6
+ - shahules786/orca-best
7
+ - argilla/databricks-dolly-15k-curated-multilingual
8
+ inference: false
9
+ language:
10
+ - en
11
+ library_name: transformers
12
+ license: llama2
13
+ model_creator: OpenAssistant
14
+ model_name: Llama2 70B SFT v10
15
+ model_type: llama
16
+ pipeline_tag: text-generation
17
+ prompt_template: '<|im_start|>system
18
+
19
+ {system_message}<|im_end|>
20
+
21
+ <|im_start|>user
22
+
23
+ {prompt}<|im_end|>
24
+
25
+ <|im_start|>assistant
26
+
27
+ '
28
+ quantized_by: TheBloke
29
+ tags:
30
+ - sft
31
+ ---
32
+
33
+ <!-- header start -->
34
+ <!-- 200823 -->
35
+ <div style="width: auto; margin-left: auto; margin-right: auto">
36
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
37
+ </div>
38
+ <div style="display: flex; justify-content: space-between; width: 100%;">
39
+ <div style="display: flex; flex-direction: column; align-items: flex-start;">
40
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
41
+ </div>
42
+ <div style="display: flex; flex-direction: column; align-items: flex-end;">
43
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
44
+ </div>
45
+ </div>
46
+ <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
47
+ <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
48
+ <!-- header end -->
49
+
50
+ # Llama2 70B SFT v10 - AWQ
51
+ - Model creator: [OpenAssistant](https://huggingface.co/OpenAssistant)
52
+ - Original model: [Llama2 70B SFT v10](https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10)
53
+
54
+ <!-- description start -->
55
+ ## Description
56
+
57
+ This repo contains AWQ model files for [OpenAssistant's Llama2 70B SFT v10](https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10).
58
+
59
+
60
+ ### About AWQ
61
+
62
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.
63
+
64
+ It is also now supported by continuous batching server [vLLM](https://github.com/vllm-project/vllm), allowing use of AWQ models for high-throughput concurrent inference in multi-user server scenarios. Note that, at the time of writing, overall throughput is still lower than running vLLM with unquantised models, however using AWQ enables using much smaller GPUs which can lead to easier deployment and overall cost savings. For example, a 70B model can be run on 1 x 48GB GPU instead of 2 x 80GB.
65
+ <!-- description end -->
66
+ <!-- repositories-available start -->
67
+ ## Repositories available
68
+
69
+ * [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/Llama2-70B-OASST-SFT-v10-AWQ)
70
+ * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Llama2-70B-OASST-SFT-v10-GPTQ)
71
+ * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Llama2-70B-OASST-SFT-v10-GGUF)
72
+ * [OpenAssistant's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10)
73
+ <!-- repositories-available end -->
74
+
75
+ <!-- prompt-template start -->
76
+ ## Prompt template: ChatML
77
+
78
+ ```
79
+ <|im_start|>system
80
+ {system_message}<|im_end|>
81
+ <|im_start|>user
82
+ {prompt}<|im_end|>
83
+ <|im_start|>assistant
84
+
85
+ ```
86
+
87
+ <!-- prompt-template end -->
88
+
89
+
90
+ <!-- README_AWQ.md-provided-files start -->
91
+ ## Provided files and AWQ parameters
92
+
93
+ For my first release of AWQ models, I am releasing 128g models only. I will consider adding 32g as well if there is interest, and once I have done perplexity and evaluation comparisons, but at this time 32g models are still not fully tested with AutoAWQ and vLLM.
94
+
95
+ Models are released as sharded safetensors files.
96
+
97
+ | Branch | Bits | GS | AWQ Dataset | Seq Len | Size |
98
+ | ------ | ---- | -- | ----------- | ------- | ---- |
99
+ | [main](https://huggingface.co/TheBloke/Llama2-70B-OASST-SFT-v10-AWQ/tree/main) | 4 | 128 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 4096 | 36.61 GB
100
+
101
+ <!-- README_AWQ.md-provided-files end -->
102
+
103
+ <!-- README_AWQ.md-use-from-vllm start -->
104
+ ## Serving this model from vLLM
105
+
106
+ Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
107
+
108
+ - When using vLLM as a server, pass the `--quantization awq` parameter, for example:
109
+
110
+ ```shell
111
+ python3 python -m vllm.entrypoints.api_server --model TheBloke/Llama2-70B-OASST-SFT-v10-AWQ --quantization awq
112
+ ```
113
+
114
+ When using vLLM from Python code, pass the `quantization=awq` parameter, for example:
115
+
116
+ ```python
117
+ from vllm import LLM, SamplingParams
118
+
119
+ prompts = [
120
+ "Hello, my name is",
121
+ "The president of the United States is",
122
+ "The capital of France is",
123
+ "The future of AI is",
124
+ ]
125
+ sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
126
+
127
+ llm = LLM(model="TheBloke/Llama2-70B-OASST-SFT-v10-AWQ", quantization="awq")
128
+
129
+ outputs = llm.generate(prompts, sampling_params)
130
+
131
+ # Print the outputs.
132
+ for output in outputs:
133
+ prompt = output.prompt
134
+ generated_text = output.outputs[0].text
135
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
136
+ ```
137
+ <!-- README_AWQ.md-use-from-vllm start -->
138
+
139
+ <!-- README_AWQ.md-use-from-python start -->
140
+ ## How to use this AWQ model from Python code
141
+
142
+ ### Install the necessary packages
143
+
144
+ Requires: [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) 0.0.2 or later
145
+
146
+ ```shell
147
+ pip3 install autoawq
148
+ ```
149
+
150
+ If you have problems installing [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) using the pre-built wheels, install it from source instead:
151
+
152
+ ```shell
153
+ pip3 uninstall -y autoawq
154
+ git clone https://github.com/casper-hansen/AutoAWQ
155
+ cd AutoAWQ
156
+ pip3 install .
157
+ ```
158
+
159
+ ### You can then try the following example code
160
+
161
+ ```python
162
+ from awq import AutoAWQForCausalLM
163
+ from transformers import AutoTokenizer
164
+
165
+ model_name_or_path = "TheBloke/Llama2-70B-OASST-SFT-v10-AWQ"
166
+
167
+ # Load model
168
+ model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
169
+ trust_remote_code=False, safetensors=True)
170
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)
171
+
172
+ prompt = "Tell me about AI"
173
+ prompt_template=f'''<|im_start|>system
174
+ {system_message}<|im_end|>
175
+ <|im_start|>user
176
+ {prompt}<|im_end|>
177
+ <|im_start|>assistant
178
+
179
+ '''
180
+
181
+ print("\n\n*** Generate:")
182
+
183
+ tokens = tokenizer(
184
+ prompt_template,
185
+ return_tensors='pt'
186
+ ).input_ids.cuda()
187
+
188
+ # Generate output
189
+ generation_output = model.generate(
190
+ tokens,
191
+ do_sample=True,
192
+ temperature=0.7,
193
+ top_p=0.95,
194
+ top_k=40,
195
+ max_new_tokens=512
196
+ )
197
+
198
+ print("Output: ", tokenizer.decode(generation_output[0]))
199
+
200
+ # Inference can also be done using transformers' pipeline
201
+ from transformers import pipeline
202
+
203
+ print("*** Pipeline:")
204
+ pipe = pipeline(
205
+ "text-generation",
206
+ model=model,
207
+ tokenizer=tokenizer,
208
+ max_new_tokens=512,
209
+ do_sample=True,
210
+ temperature=0.7,
211
+ top_p=0.95,
212
+ top_k=40,
213
+ repetition_penalty=1.1
214
+ )
215
+
216
+ print(pipe(prompt_template)[0]['generated_text'])
217
+ ```
218
+ <!-- README_AWQ.md-use-from-python end -->
219
+
220
+ <!-- README_AWQ.md-compatibility start -->
221
+ ## Compatibility
222
+
223
+ The files provided are tested to work with [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), and [vLLM](https://github.com/vllm-project/vllm).
224
+
225
+ [Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) is not yet compatible with AWQ, but a PR is open which should bring support soon: [TGI PR #781](https://github.com/huggingface/text-generation-inference/issues/781).
226
+ <!-- README_AWQ.md-compatibility end -->
227
+
228
+ <!-- footer start -->
229
+ <!-- 200823 -->
230
+ ## Discord
231
+
232
+ For further support, and discussions on these models and AI in general, join us at:
233
+
234
+ [TheBloke AI's Discord server](https://discord.gg/theblokeai)
235
+
236
+ ## Thanks, and how to contribute
237
+
238
+ Thanks to the [chirper.ai](https://chirper.ai) team!
239
+
240
+ Thanks to Clay from [gpus.llm-utils.org](llm-utils)!
241
+
242
+ I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
243
+
244
+ If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
245
+
246
+ Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
247
+
248
+ * Patreon: https://patreon.com/TheBlokeAI
249
+ * Ko-Fi: https://ko-fi.com/TheBlokeAI
250
+
251
+ **Special thanks to**: Aemon Algiz.
252
+
253
+ **Patreon special mentions**: Alicia Loh, Stephen Murray, K, Ajan Kanaga, RoA, Magnesian, Deo Leter, Olakabola, Eugene Pentland, zynix, Deep Realms, Raymond Fosdick, Elijah Stavena, Iucharbius, Erik Bjäreholt, Luis Javier Navarrete Lozano, Nicholas, theTransient, John Detwiler, alfie_i, knownsqashed, Mano Prime, Willem Michiel, Enrico Ros, LangChain4j, OG, Michael Dempsey, Pierre Kircher, Pedro Madruga, James Bentley, Thomas Belote, Luke @flexchar, Leonard Tan, Johann-Peter Hartmann, Illia Dulskyi, Fen Risland, Chadd, S_X, Jeff Scroggin, Ken Nordquist, Sean Connelly, Artur Olbinski, Swaroop Kallakuri, Jack West, Ai Maven, David Ziegler, Russ Johnson, transmissions 11, John Villwock, Alps Aficionado, Clay Pascal, Viktor Bowallius, Subspace Studios, Rainer Wilmers, Trenton Dambrowitz, vamX, Michael Levine, 준교 김, Brandon Frisco, Kalila, Trailburnt, Randy H, Talal Aujan, Nathan Dryer, Vadim, 阿明, ReadyPlayerEmma, Tiffany J. Kim, George Stoitzev, Spencer Kim, Jerry Meng, Gabriel Tamborski, Cory Kujawski, Jeffrey Morgan, Spiking Neurons AB, Edmond Seymore, Alexandros Triantafyllidis, Lone Striker, Cap'n Zoog, Nikolai Manek, danny, ya boyyy, Derek Yates, usrbinkat, Mandus, TL, Nathan LeClaire, subjectnull, Imad Khwaja, webtim, Raven Klaugh, Asp the Wyvern, Gabriel Puliatti, Caitlyn Gatomon, Joseph William Delisle, Jonathan Leane, Luke Pendergrass, SuperWojo, Sebastain Graf, Will Dee, Fred von Graf, Andrey, Dan Guido, Daniel P. Andersen, Nitin Borwankar, Elle, Vitor Caleffi, biorpg, jjj, NimbleBox.ai, Pieter, Matthew Berman, terasurfer, Michael Davis, Alex, Stanislav Ovsiannikov
254
+
255
+
256
+ Thank you to all my generous patrons and donaters!
257
+
258
+ And thank you again to a16z for their generous grant.
259
+
260
+ <!-- footer end -->
261
+
262
+ # Original model card: OpenAssistant's Llama2 70B SFT v10
263
+
264
+ # Open-Assistant Llama2 70B SFT v10
265
+
266
+ This model is an Open-Assistant fine-tuning of Meta's [Llama2 70B](https://huggingface.co/meta-llama/Llama-2-70b) LLM.
267
+ It was fine-tuned in two stages, first on a mix of synthetic instrunctions and coding tasks and then in a "polishing" stage
268
+ on the best human demonstrations collected at [open-assistant.io](https://open-assistant.io/) up to July 23, 2023 (see [Configuration Details](#configuration-details) below).
269
+
270
+ ## Model Details
271
+
272
+ - **Finetuned from:** [meta-llama/Llama-2-70b](https://huggingface.co/meta-llama/Llama-2-70b) via [epfLLM/Megatron-LLM](https://github.com/epfLLM/Megatron-LLM)
273
+ - **Model type:** Causal decoder-only transformer language model
274
+ - **Language:** English (and limited capabilities in German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish)
275
+ - **Weights & Biases training logs:** [Stage 1](https://wandb.ai/open-assistant/public-sft/runs/run45_oasst_pre10_llama2_70b) (1 epoch pretrain-mix, 12k steps), [Stage 2](https://wandb.ai/open-assistant/public-sft/runs/run46_oasst_sft10_llama2_70b) (3 epochs oasst top-1, 519 steps)
276
+ - **Demo:** [Continuations for 250 random prompts (TGI, 4bit nf4 quantization)](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-sft%2F2023-08-22_OpenAssistant_llama2-70b-oasst-sft-v10_sampling_noprefix2_nf4.json%0A)
277
+ - **Evaluation** [FastEval-OpenAssistant Overview](https://tju01.github.io/FastEval-OpenAssistant/) (using [FastEval](https://github.com/FastEval/FastEval) & [vLLM](https://github.com/vllm-project/vllm))
278
+ - **License:** [LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta-llama/Llama-2-70b/raw/main/LICENSE.txt)
279
+ - **Contact:** [Open-Assistant Discord](https://ykilcher.com/open-assistant-discord)
280
+
281
+
282
+ ## Prompting / Prompt Template
283
+
284
+ Due to public demand (see [survey](https://twitter.com/erhartford/status/1682403597525430272)) we changed the prompt-template for this model from custom prompter/assistant tokens to OpenAI's [chatml](https://github.com/openai/openai-python/blob/main/chatml.md) standard prompt format.
285
+ We hope that this leads to greater compatibility with chat inference/frontend applications.
286
+
287
+ Prompt dialogue template:
288
+
289
+ ```
290
+ """
291
+ <|im_start|>system
292
+ {system_message}<|im_end|>
293
+ <|im_start|>user
294
+ {prompt}<|im_end|>
295
+ <|im_start|>assistant
296
+ """
297
+ ```
298
+
299
+ The model input can contain multiple conversation turns between user and assistant, e.g.
300
+ ```
301
+ <|im_start|>user
302
+ {prompt 1}<|im_end|>
303
+ <|im_start|>assistant
304
+ {reply 1}<|im_end|>
305
+ <|im_start|>user
306
+ {prompt 2}<|im_end|>
307
+ <|im_start|>assistant
308
+ (...)
309
+ ```
310
+
311
+ The model was partly trained with orca system messages.
312
+ For inference we recommend to use the official [Llama2 system message](https://github.com/facebookresearch/llama/blob/ea9f33d6d3ea8ed7d560d270986407fd6c2e52b7/example_chat_completion.py#L57-L61):
313
+ ```
314
+ <|im_start|>system
315
+ You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
316
+
317
+ If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
318
+ <|im_end|>
319
+ ```
320
+
321
+ ### Credits & Special Thanks
322
+
323
+ - Thanks to [Meta AI](https://ai.meta.com/) for training and releasing the Llama2 model.
324
+ - Distributed training support was provided by EPFL's [Machine Learning and Optimization Laboratory](https://www.epfl.ch/labs/mlo/), and [Natural Language Processing Lab](https://nlp.epfl.ch/).
325
+ - The open-source [epfLLM/Megatron-LLM](https://github.com/epfLLM/Megatron-LLM) trainer was used for fine-tuning.
326
+ - [rombodawg](https://huggingface.co/rombodawg) curated the [LosslessMegaCodeTrainingV2_1m_Evol_Uncensored](https://huggingface.co/datasets/rombodawg/LosslessMegaCodeTrainingV2_1m_Evol_Uncensored) dataset.
327
+ - [ehartford](https://huggingface.co/ehartford) generated and published the [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) and the [ehartford/oa_leet10k](https://huggingface.co/datasets/ehartford/oa_leet10k) datasets.
328
+ - [Argilla](https://huggingface.co/argilla) curated and published the [argilla/databricks-dolly-15k-curated-multilingual](https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-multilingual) dataset.
329
+ - [shahules786](https://github.com/shahules786) de-duped and filtered the Dolphin dataset with a cluster-center approach and generated the orca-best (ocra-chat) dataset.
330
+ - [andreaskoepf](https://github.com/andreaskoepf/) prepared & orchestrated the training.
331
+
332
+ We want to especially thank everyone who contributed in the crowed-sourced Open-Assistant dataset creation on https://open-assistant.io/ - without you this project would not have been possible.
333
+
334
+ ## Ethical Considerations and Limitations
335
+
336
+ Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios.
337
+ For these reasons, as with all LLMs, the potential outputs of llama2-70b-oasst-sft-v10 cannot be predicted
338
+ in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses
339
+ to user prompts. Therefore, before deploying any applications of llama2-70b-oasst-sft-v10, developers should
340
+ perform safety testing and tuning tailored to their specific applications of the model.
341
+
342
+ Please see Meta's [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/).
343
+
344
+ ## Note regarding inference with TGI
345
+
346
+ During evaluation we noticed that this 70B model produced extremely poor outputs when loaded it was loaded in 16 bit precision sharded in [TGI](https://github.com/huggingface/text-generation-inference).
347
+ In contrast the model could be evaluated without problem using [vLLM](https://github.com/vllm-project/vllm).
348
+ The model also worked decently well when loaded with TGI on a single GPPU nf4 quantized via [TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
349
+ Will will get it touch with the TGI authors to find out why sharded 16-bit inference doesn't work as expected.
350
+
351
+ ## Configuration Details
352
+
353
+ The "pretokenizer" utility used to tokenize the datamix is part of the Open-Assistant github repository and can be found here: [model/pretokenizer](https://github.com/LAION-AI/Open-Assistant/tree/main/model/pretokenizer).
354
+
355
+
356
+ ### Stage 1 Pretokenizer Configuration
357
+
358
+ Entries of the dataset with assistant replies shorter than 25 tokens were excluded from training.
359
+
360
+ ```
361
+ oasst_pre10_min25:
362
+ datasets:
363
+ - megacode2:
364
+ fraction: 0.5
365
+ val_split: 0.01
366
+ max_val_set: 1000
367
+ - orca-chat:
368
+ val_split: 0.01
369
+ max_val_set: 1000
370
+ - dolly15k_multilingual:
371
+ val_split: 0.05
372
+ max_val_set: 300
373
+ - oa_leet10k:
374
+ val_split: 0.05
375
+ max_val_set: 250
376
+ output_dir: "output/oasst_pre10_min25"
377
+ filename_prefix: "oasst_pre10"
378
+ min_assistant_tokens: 25
379
+ ```
380
+
381
+ Stage 1 dataset statistics:
382
+ ```
383
+ # Stats for output/oasst_pre10_min25_llama2
384
+
385
+ ## Stats for 'Subset of InstructionDataset (megacode2)' (466364 samples (50.0%))
386
+ -----------------
387
+ Accepted: 398223/466364 (85.4%)
388
+ Accepted tokens: 167676873
389
+ Skipped: 68141 (14.6%)
390
+ Min tokens per sample: 36
391
+ Max tokens per sample: 11810
392
+ Avg tokens per sample: 421.063
393
+ -----------------
394
+
395
+ ## Stats for 'Subset of OrcaChat (orca-chat)' (325616 samples (100.0%))
396
+ -----------------
397
+ Accepted: 325616/325616 (100.0%)
398
+ Accepted tokens: 178307574
399
+ Skipped: 0 (0.0%)
400
+ Min tokens per sample: 105
401
+ Max tokens per sample: 10408
402
+ Avg tokens per sample: 547.601
403
+ -----------------
404
+
405
+ ## Stats for 'Subset of Dolly15kMultilingual' (57020 samples (100.0%))
406
+ -----------------
407
+ Accepted: 47494/57020 (83.3%)
408
+ Accepted tokens: 13883177
409
+ Skipped: 9526 (16.7%)
410
+ Min tokens per sample: 34
411
+ Max tokens per sample: 9172
412
+ Avg tokens per sample: 292.314
413
+ -----------------
414
+
415
+ ## Stats for 'Subset of InstructionDataset (oa_leet10k)' (22236 samples (100.0%))
416
+ -----------------
417
+ Accepted: 22236/22236 (100.0%)
418
+ Accepted tokens: 15905296
419
+ Skipped: 0 (0.0%)
420
+ Min tokens per sample: 168
421
+ Max tokens per sample: 10588
422
+ Avg tokens per sample: 715.295
423
+ -----------------
424
+
425
+ ## Stats for 'total' (871236 samples (100.0%))
426
+ -----------------
427
+ Accepted: 793569/871236 (91.1%)
428
+ Accepted tokens: 375772920
429
+ Skipped: 77667 (8.9%)
430
+ Min tokens per sample: 34
431
+ Max tokens per sample: 11810
432
+ Avg tokens per sample: 473.523
433
+ -----------------
434
+ ```
435
+
436
+
437
+ ### Stage 2 Pretokenizer Configuration
438
+
439
+ ```
440
+ oasst_top1:
441
+ datasets:
442
+ - oasst_export:
443
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
444
+ input_file_path: 2023-07-23_oasst_ready.tar.gz
445
+ top_k: 1
446
+ val_split: 0.05
447
+ output_dir: "output/oasst_top1_2023-07-23"
448
+ filename_prefix: "oasst_top1"
449
+ ```
450
+
451
+ Stage 2 dataset statistics:
452
+
453
+ ```
454
+ # Stats for output/oasst_top1_2023-07-23_llama2
455
+
456
+ ## Stats for 'ListDataset' (11441 samples (100.0%))
457
+ -----------------
458
+ Accepted: 11441/11441 (100.0%)
459
+ Accepted tokens: 5315368
460
+ Skipped: 0 (0.0%)
461
+ Min tokens per sample: 20
462
+ Max tokens per sample: 5407
463
+ Avg tokens per sample: 464.58945896337735
464
+ -----------------
465
+
466
+ ## Stats for 'total' (11441 samples (100.0%))
467
+ -----------------
468
+ Accepted: 11441/11441 (100.0%)
469
+ Accepted tokens: 5315368
470
+ Skipped: 0 (0.0%)
471
+ Min tokens per sample: 20
472
+ Max tokens per sample: 5407
473
+ Avg tokens per sample: 464.58945896337735
474
+ -----------------
475
+ ```
476
+
477
+
478
+ ### Megatron Fine-Tuning Arguments for Stage 1 (Instruction Tuning):
479
+ ```
480
+ --tensor_model_parallel_size 8
481
+ --pipeline_model_parallel_size 4
482
+ --load ./checkpoints/llama2-70b-tp8-pp4
483
+ --save ./checkpoints/llama2-70b-tp8-pp4-oasst_pre10
484
+ --tensorboard_dir ./checkpoints/llama2-70b-tp8-pp4-oasst_pre10/logging
485
+ --data_path ./data/oasst_pre10_min25_llama2/oasst_sft10-train
486
+ --model_name llama2
487
+ --tokenizer_type SentencePieceTokenizer
488
+ --bf16
489
+ --global_batch_size 64
490
+ --micro_batch_size 2
491
+ --vocab_file=./llama2/Llama-2-7b/tokenizer.model
492
+ --use_rms_norm
493
+ --glu_activation swiglu
494
+ --no_tie_embed_logits
495
+ --vocab_extra_ids_list "\"<|im_start|>,<|im_end|>\""
496
+ --layernorm_epsilon 1e-5
497
+ --use_flash_attn
498
+ --no_bias_gelu_fusion
499
+ --seq_length 4096
500
+ --max_position_embeddings 4096
501
+ --log_interval 1
502
+ --save_interval 500
503
+ --eval_interval 50
504
+ --eval_iters 10
505
+ --hidden_dropout 0.0
506
+ --position_embedding_type rotary
507
+ --no_bias_dropout_fusion
508
+ --use_checkpoint_args
509
+ --train_iters 12000
510
+ --attention_dropout 0.0
511
+ --adam_beta1 0.9
512
+ --adam_beta2 0.95
513
+ --adam_eps 1e-12
514
+ --lr_decay_style cosine
515
+ --lr_warmup_iters 100
516
+ --lr 1e-5
517
+ --min_lr 1e-6
518
+ --weight_decay 0.000001
519
+ --sequence_parallel
520
+ --recompute_granularity selective
521
+ --log_timers_to_tensorboard
522
+ --rope_scaling_factor 1.0
523
+ --wandb_logger
524
+ ```
525
+
526
+ ### Megatron Fine-Tuning Arguments for Stage 2 (OASST Polishing, LIMA Dropout):
527
+ ```
528
+ --tensor_model_parallel_size 8
529
+ --pipeline_model_parallel_size 4
530
+ --load ./checkpoints/llama2-70b-tp8-pp4-oasst_pre10
531
+ --save ./checkpoints/llama2-70b-tp8-pp4-oasst_sft10
532
+ --tensorboard_dir ./checkpoints/llama2-70b-tp8-pp4-oasst_sft10/logging
533
+ --data_path ./data/oasst_top1_2023-07-23_llama2/oasst_top1-train
534
+ --model_name llama2
535
+ --tokenizer_type SentencePieceTokenizer
536
+ --bf16
537
+ --global_batch_size 64
538
+ --micro_batch_size 2
539
+ --vocab_file=./llama2/Llama-2-7b/tokenizer.model
540
+ --use_rms_norm
541
+ --glu_activation swiglu
542
+ --no_tie_embed_logits
543
+ --vocab_extra_ids_list "\"<|im_start|>,<|im_end|>\""
544
+ --layernorm_epsilon 1e-5
545
+ --use_flash_attn
546
+ --no_bias_gelu_fusion
547
+ --seq_length 4096
548
+ --max_position_embeddings 4096
549
+ --log_interval 1
550
+ --save_interval 346
551
+ --eval_interval 50
552
+ --eval_iters 10
553
+ --hidden_dropout 0.25
554
+ --lima_dropout
555
+ --position_embedding_type rotary
556
+ --no_bias_dropout_fusion
557
+ --use_checkpoint_args
558
+ --train_iters 519
559
+ --attention_dropout 0.0
560
+ --adam_beta1 0.9
561
+ --adam_beta2 0.95
562
+ --adam_eps 1e-12
563
+ --lr_decay_style cosine
564
+ --lr_warmup_iters 100
565
+ --lr 1e-5
566
+ --min_lr 1e-6
567
+ --weight_decay 0.000001
568
+ --sequence_parallel
569
+ --recompute_granularity selective
570
+ --log_timers_to_tensorboard
571
+ --rope_scaling_factor 1.0
572
+ --finetune
573
+ --wandb_logger
574
+ ```