Text Generation
Transformers
PyTorch
English
llama
sft
Inference Endpoints
text-generation-inference
andreaskoepf commited on
Commit
7d15677
1 Parent(s): ac7f5bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +192 -0
README.md CHANGED
@@ -1,3 +1,195 @@
1
  ---
2
  license: llama2
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
+ language:
4
+ - en
5
+ datasets:
6
+ - OpenAssistant/oasst1
7
  ---
8
+ # Open-Assistant Llama2 70B SFT v10
9
+
10
+ This model is an Open-Assistant fine-tuning of Meta's [Llama2 70B](https://huggingface.co/meta-llama/Llama-2-70b) LLM.
11
+
12
+
13
+ ## Model Details
14
+
15
+ - **Finetuned from:** [meta-llama/Llama-2-70b](https://huggingface.co/meta-llama/Llama-2-70b) via [epfLLM/old-Megatron-LM](https://github.com/epfLLM/old-Megatron-LM)
16
+ - **Model type:** Causal decoder-only transformer language model
17
+ - **Language:** English, German, Spanish, French (and limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish)
18
+ - **Weights & Biases:** [Stage 1](https://wandb.ai/open-assistant/public-sft/runs/run45_oasst_pre10_llama2_70b) (1 epoch pretrain-mix, 12k steps), [Stage 2](https://wandb.ai/open-assistant/public-sft/runs/run46_oasst_sft10_llama2_70b) (3 epochs oasst top-1, 519 steps)
19
+ - **Demo:** [Continuations for 250 random prompts (TGI, 4bit nf4 quantization)](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-sft%2F2023-08-22_OpenAssistant_llama2-70b-oasst-sft-v10_sampling_noprefix2_nf4.json%0A)
20
+ - **Evaluation** [FastEval-OpenAssistant Overview](https://tju01.github.io/FastEval-OpenAssistant/) (using [FastEval](https://github.com/FastEval/FastEval) & [vLLM](https://github.com/vllm-project/vllm))
21
+ - **License:** [LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta-llama/Llama-2-70b/raw/main/LICENSE.txt)
22
+ - **Contact:** [Open-Assistant Discord](https://ykilcher.com/open-assistant-discord)
23
+
24
+
25
+ ## Prompting / Prompt Template
26
+
27
+ The model was trained with OpenAI's [chatml](https://github.com/openai/openai-python/blob/main/chatml.md) prompt format:
28
+ "<|im_start|>system\n{system_message}<im_end>\n<|im_start|>user\n{user prompt}<|im_end|>\n<|im_start|>assistant\n{Assistant answer}<|im_end|>\n"
29
+
30
+
31
+ Multi-line:
32
+
33
+ ```
34
+ <|im_start|>system
35
+ {system_message}<|im_end|>
36
+ <|im_start|>user
37
+ {user prompt}<|im_end|>
38
+ <|im_start|>assistant
39
+ {Assistant answer}<|im_end|>
40
+ ```
41
+
42
+ The model was partly trained with orca system messages. For inference we can recommend the official [llama2 system prompt](https://github.com/facebookresearch/llama/blob/ea9f33d6d3ea8ed7d560d270986407fd6c2e52b7/example_chat_completion.py#L57-L61):
43
+ ```
44
+ <|im_start|>system
45
+ You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
46
+ If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
47
+ <|im_end|>
48
+ ```
49
+
50
+ ## Configuration Details
51
+
52
+ ### Stage 1 Pretokenizer Configuration
53
+
54
+ ```
55
+ oasst_pre10_min25:
56
+ datasets:
57
+ - megacode2:
58
+ fraction: 0.5
59
+ val_split: 0.01
60
+ max_val_set: 1000
61
+ - orca-chat:
62
+ val_split: 0.01
63
+ max_val_set: 1000
64
+ - dolly15k_multilingual:
65
+ val_split: 0.05
66
+ max_val_set: 300
67
+ - oa_leet10k:
68
+ val_split: 0.05
69
+ max_val_set: 250
70
+ output_dir: "output/oasst_pre10_min25"
71
+ filename_prefix: "oasst_pre10"
72
+ min_assistant_tokens: 25
73
+ ```
74
+
75
+ ### Stage 2 Pretokenizer Configuration
76
+
77
+ ```
78
+ oasst_top1:
79
+ datasets:
80
+ - oasst_export:
81
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
82
+ input_file_path: 2023-07-23_oasst_ready.tar.gz
83
+ top_k: 1
84
+ val_split: 0.05
85
+ output_dir: "output/oasst_top1_2023-07-23"
86
+ filename_prefix: "oasst_top1"
87
+ ```
88
+
89
+ ### Megatron Fine-Tuning Arguments for Stage 1 (Instruction Tuning):
90
+ ```
91
+ --tensor_model_parallel_size 8
92
+ --pipeline_model_parallel_size 4
93
+ --load ./akoepf/checkpoints/llama2-70b-tp8-pp4
94
+ --save ./akoepf/checkpoints/llama2-70b-tp8-pp4-oasst_pre10
95
+ --tensorboard_dir ./akoepf/checkpoints/llama2-70b-tp8-pp4-oasst_pre10/logging
96
+ --data_path ./akoepf/data/oasst_pre10_min25_llama2/oasst_sft10-train
97
+ --model_name llama2
98
+ --tokenizer_type SentencePieceTokenizer
99
+ --bf16
100
+ --global_batch_size 64
101
+ --micro_batch_size 2
102
+ --vocab_file=./akoepf/llama2/Llama-2-7b/tokenizer.model
103
+ --use_rms_norm
104
+ --glu_activation swiglu
105
+ --no_tie_embed_logits
106
+ --vocab_extra_ids_list "\"<|im_start|>,<|im_end|>\""
107
+ --layernorm_epsilon 1e-5
108
+ --use_flash_attn
109
+ --no_bias_gelu_fusion
110
+ --seq_length 4096
111
+ --max_position_embeddings 4096
112
+ --log_interval 1
113
+ --save_interval 500
114
+ --eval_interval 50
115
+ --eval_iters 10
116
+ --hidden_dropout 0.0
117
+ --position_embedding_type rotary
118
+ --no_bias_dropout_fusion
119
+ --use_checkpoint_args
120
+ --train_iters 12000
121
+ --attention_dropout 0.0
122
+ --adam_beta1 0.9
123
+ --adam_beta2 0.95
124
+ --adam_eps 1e-12
125
+ --lr_decay_style cosine
126
+ --lr_warmup_iters 100
127
+ --lr 1e-5
128
+ --min_lr 1e-6
129
+ --weight_decay 0.000001
130
+ --sequence_parallel
131
+ --recompute_granularity selective
132
+ --log_timers_to_tensorboard
133
+ --rope_scaling_factor 1.0
134
+ --wandb_logger
135
+ ```
136
+
137
+ ### Megatron Fine-Tuning Arguments for Stage 2 (OASST Polishing, LIMA Dropout):
138
+ ```
139
+ --tensor_model_parallel_size 8
140
+ --pipeline_model_parallel_size 4
141
+ --load ./akoepf/checkpoints/llama2-70b-tp8-pp4-oasst_pre10
142
+ --save ./akoepf/checkpoints/llama2-70b-tp8-pp4-oasst_sft10
143
+ --tensorboard_dir ./akoepf/checkpoints/llama2-70b-tp8-pp4-oasst_sft10/logging
144
+ --data_path ./akoepf/data/oasst_top1_2023-07-23_llama2/oasst_top1-train
145
+ --model_name llama2
146
+ --tokenizer_type SentencePieceTokenizer
147
+ --bf16
148
+ --global_batch_size 64
149
+ --micro_batch_size 2
150
+ --vocab_file=./akoepf/llama2/Llama-2-7b/tokenizer.model
151
+ --use_rms_norm
152
+ --glu_activation swiglu
153
+ --no_tie_embed_logits
154
+ --vocab_extra_ids_list "\"<|im_start|>,<|im_end|>\""
155
+ --layernorm_epsilon 1e-5
156
+ --use_flash_attn
157
+ --no_bias_gelu_fusion
158
+ --seq_length 4096
159
+ --max_position_embeddings 4096
160
+ --log_interval 1
161
+ --save_interval 346
162
+ --eval_interval 50
163
+ --eval_iters 10
164
+ --hidden_dropout 0.25
165
+ --lima_dropout
166
+ --position_embedding_type rotary
167
+ --no_bias_dropout_fusion
168
+ --use_checkpoint_args
169
+ --train_iters 519
170
+ --attention_dropout 0.0
171
+ --adam_beta1 0.9
172
+ --adam_beta2 0.95
173
+ --adam_eps 1e-12
174
+ --lr_decay_style cosine
175
+ --lr_warmup_iters 100
176
+ --lr 1e-5
177
+ --min_lr 1e-6
178
+ --weight_decay 0.000001
179
+ --sequence_parallel
180
+ --recompute_granularity selective
181
+ --log_timers_to_tensorboard
182
+ --rope_scaling_factor 1.0
183
+ --finetune
184
+ --wandb_logger
185
+ ```
186
+
187
+
188
+ ## Ethical Considerations and Limitations
189
+
190
+ Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios.
191
+ For these reasons, as with all LLMs, the potential outputs of llama2-70b-oasst-sft-v10 cannot be predicted
192
+ in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses
193
+ to user prompts. Therefore, before deploying any applications of llama2-70b-oasst-sft-v10, developers should
194
+ perform safety testing and tuning tailored to their specific applications of the model.
195
+