Suparious commited on
Commit
edea00f
1 Parent(s): 3359cb8

Adding usage example to model card

Browse files
Files changed (1) hide show
  1. README.md +117 -1
README.md CHANGED
@@ -6,7 +6,19 @@ library_name: transformers
6
  datasets:
7
  - NeuralNovel/Neural-Story-v1
8
  base_model: mistralai/Mistral-7B-Instruct-v0.2
9
- inference: false
 
 
 
 
 
 
 
 
 
 
 
 
10
  model-index:
11
  - name: Mistral-7B-Instruct-v0.2-Neural-Story
12
  results:
@@ -110,6 +122,110 @@ model-index:
110
  source:
111
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story
112
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  ---
114
  # NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story AWQ
115
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  datasets:
7
  - NeuralNovel/Neural-Story-v1
8
  base_model: mistralai/Mistral-7B-Instruct-v0.2
9
+ tags:
10
+ - quantized
11
+ - 4-bit
12
+ - AWQ
13
+ - transformers
14
+ - pytorch
15
+ - mistral
16
+ - text-generation
17
+ - conversational
18
+ - license:apache-2.0
19
+ - autotrain_compatible
20
+ - endpoints_compatible
21
+ - text-gen
22
  model-index:
23
  - name: Mistral-7B-Instruct-v0.2-Neural-Story
24
  results:
 
122
  source:
123
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story
124
  name: Open LLM Leaderboard
125
+ model_creator: NeuralNovel
126
+ model_name: Mistral-7B-Instruct-v0.2-Neural-Story
127
+ model_type: mistral
128
+ pipeline_tag: text-generation
129
+ inference: false
130
+ prompt_template: '<|im_start|>system
131
+
132
+ {system_message}<|im_end|>
133
+
134
+ <|im_start|>user
135
+
136
+ {prompt}<|im_end|>
137
+
138
+ <|im_start|>assistant
139
+
140
+ '
141
+ quantized_by: Suparious
142
  ---
143
  # NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story AWQ
144
 
145
+ - Model creator: [cognitivecomputations](https://huggingface.co/cognitivecomputations)
146
+ - Original model: [dolphin-2.8-experiment26-7b](https://huggingface.co/cognitivecomputations/dolphin-2.8-experiment26-7b)
147
+
148
+ ![Neural-Story](https://i.ibb.co/JFRYk6g/OIG-27.jpg)
149
+
150
+ ## Model Summary
151
+
152
+ The **Mistral-7B-Instruct-v0.2-Neural-Story** model, developed by NeuralNovel and funded by Techmind, is a language model finetuned from Mistral-7B-Instruct-v0.2.
153
+
154
+ Designed to generate instructive and narrative text, with a specific focus on storytelling.
155
+ This fine-tune has been tailored to provide detailed and creative responses in the context of narrative and optimised for short story telling.
156
+
157
+ Based on mistralAI, with apache-2.0 license, suitable for commercial or non-commercial use.
158
+
159
+ [Join NeuralNovel Discord!](https://discord.gg/rJXGjmxqzS)
160
+
161
+ ## How to use
162
+
163
+ ### Install the necessary packages
164
+
165
+ ```bash
166
+ pip install --upgrade autoawq autoawq-kernels
167
+ ```
168
+
169
+ ### Example Python code
170
+
171
+ ```python
172
+ from awq import AutoAWQForCausalLM
173
+ from transformers import AutoTokenizer, TextStreamer
174
+
175
+ model_path = "solidrust/Mistral-7B-Instruct-v0.2-Neural-Story-AWQ"
176
+ system_message = "You are Mistral, incarnated as a powerful AI."
177
+
178
+ # Load model
179
+ model = AutoAWQForCausalLM.from_quantized(model_path,
180
+ fuse_layers=True)
181
+ tokenizer = AutoTokenizer.from_pretrained(model_path,
182
+ trust_remote_code=True)
183
+ streamer = TextStreamer(tokenizer,
184
+ skip_prompt=True,
185
+ skip_special_tokens=True)
186
+
187
+ # Convert prompt to tokens
188
+ prompt_template = """\
189
+ <|im_start|>system
190
+ {system_message}<|im_end|>
191
+ <|im_start|>user
192
+ {prompt}<|im_end|>
193
+ <|im_start|>assistant"""
194
+
195
+ prompt = "You're standing on the surface of the Earth. "\
196
+ "You walk one mile south, one mile west and one mile north. "\
197
+ "You end up exactly where you started. Where are you?"
198
+
199
+ tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
200
+ return_tensors='pt').input_ids.cuda()
201
+
202
+ # Generate output
203
+ generation_output = model.generate(tokens,
204
+ streamer=streamer,
205
+ max_new_tokens=512)
206
+
207
+ ```
208
+
209
+ ### About AWQ
210
+
211
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
212
+
213
+ AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
214
+
215
+ It is supported by:
216
+
217
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
218
+ - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.
219
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
220
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
221
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
222
+
223
+ ## Prompt template: ChatML
224
+
225
+ ```plaintext
226
+ <|im_start|>system
227
+ {system_message}<|im_end|>
228
+ <|im_start|>user
229
+ {prompt}<|im_end|>
230
+ <|im_start|>assistant
231
+ ```