Smokeweaver commited on
Commit
6767891
1 Parent(s): 6aa30cf

Update model card

Browse files
Files changed (1) hide show
  1. README.md +95 -1
README.md CHANGED
@@ -127,7 +127,7 @@ model-index:
127
  name: Open LLM Leaderboard
128
  library_name: transformers
129
  model_creator: mlabonne
130
- model_name: Darewin-7B
131
  model_type: mistral
132
  pipeline_tag: text-generation
133
  inference: false
@@ -145,3 +145,97 @@ prompt_template: '<|im_start|>system
145
  quantized_by: Suparious
146
  ---
147
  # mlabonne/NeuralHermes-2.5-Mistral-7B-laser AWQ
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
127
  name: Open LLM Leaderboard
128
  library_name: transformers
129
  model_creator: mlabonne
130
+ model_name: NeuralHermes-2.5-Mistral-7B-laser
131
  model_type: mistral
132
  pipeline_tag: text-generation
133
  inference: false
 
145
  quantized_by: Suparious
146
  ---
147
  # mlabonne/NeuralHermes-2.5-Mistral-7B-laser AWQ
148
+
149
+ - Model creator: [mlabonne](https://huggingface.co/mlabonne)
150
+ - Original model: [NeuralHermes-2.5-Mistral-7B-laser](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B-laser)
151
+
152
+ <center><img src="https://i.imgur.com/gUlEJuU.jpeg"></center>
153
+
154
+ ## Model Sumamry
155
+
156
+ This is an experimental LASER version of NeuralHermes using [laserRMT](https://github.com/cognitivecomputations/laserRMT), based on [this paper](https://arxiv.org/pdf/2312.13558.pdf).
157
+
158
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
159
+ |------------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
160
+ |[NeuralHermes-2.5-Mistral-7B-laser](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B-laser)| 43.54| 73.44| 55.26| 42.24| 53.62|
161
+ |[NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) | 43.67| 73.24| 55.37| 41.76| 53.51|
162
+
163
+ Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.
164
+
165
+ NeuralHermes is an [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on several benchmarks (see results).
166
+
167
+ It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
168
+
169
+ The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
170
+
171
+ ## How to use
172
+
173
+ ### Install the necessary packages
174
+
175
+ ```bash
176
+ pip install --upgrade autoawq autoawq-kernels
177
+ ```
178
+
179
+ ### Example Python code
180
+
181
+ ```python
182
+ from awq import AutoAWQForCausalLM
183
+ from transformers import AutoTokenizer, TextStreamer
184
+
185
+ model_path = "solidrust/NeuralHermes-2.5-Mistral-7B-laser-AWQ"
186
+ system_message = "You are Hermes, incarnated as a powerful AI."
187
+
188
+ # Load model
189
+ model = AutoAWQForCausalLM.from_quantized(model_path,
190
+ fuse_layers=True)
191
+ tokenizer = AutoTokenizer.from_pretrained(model_path,
192
+ trust_remote_code=True)
193
+ streamer = TextStreamer(tokenizer,
194
+ skip_prompt=True,
195
+ skip_special_tokens=True)
196
+
197
+ # Convert prompt to tokens
198
+ prompt_template = """\
199
+ <|im_start|>system
200
+ {system_message}<|im_end|>
201
+ <|im_start|>user
202
+ {prompt}<|im_end|>
203
+ <|im_start|>assistant"""
204
+
205
+ prompt = "You're standing on the surface of the Earth. "\
206
+ "You walk one mile south, one mile west and one mile north. "\
207
+ "You end up exactly where you started. Where are you?"
208
+
209
+ tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
210
+ return_tensors='pt').input_ids.cuda()
211
+
212
+ # Generate output
213
+ generation_output = model.generate(tokens,
214
+ streamer=streamer,
215
+ max_new_tokens=512)
216
+
217
+ ```
218
+
219
+ ### About AWQ
220
+
221
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
222
+
223
+ AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
224
+
225
+ It is supported by:
226
+
227
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
228
+ - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
229
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
230
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
231
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
232
+
233
+ ## Prompt template: ChatML
234
+
235
+ ```plaintext
236
+ <|im_start|>system
237
+ {system_message}<|im_end|>
238
+ <|im_start|>user
239
+ {prompt}<|im_end|>
240
+ <|im_start|>assistant
241
+ ```