TheBloke
/

Nous-Hermes-13B-GPTQ

@@ -56,31 +56,75 @@ or
 ## How to easily download and use this model in text-generation-webui
-### Downloading the model
 1. Click the **Model tab**.
 2. Under **Download custom model or LoRA**, enter `TheBloke/Nous-Hermes-13B-GPTQ`.
 3. Click **Download**.
-4. Wait until it says it's finished downloading.
-5. Untick "Autoload model"
-6. Click the **Refresh** icon next to **Model** in the top left.
-### To use with AutoGPTQ (if installed)
-1. In the **Model drop-down**: choose the model you just downloaded, `Nous-Hermes-13B-GPTQ`.
-2. Under **GPTQ**, tick **AutoGPTQ**.
-3. Click **Save settings for this model** in the top right.
-4. Click **Reload the Model** in the top right.
-5. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
-### To use with GPTQ-for-LLaMa
-1. In the **Model drop-down**: choose the model you just downloaded, `Nous-Hermes-13B-GPTQ`.
-2. If you see an error in the bottom right, ignore it - it's temporary.
-3. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
-4. Click **Save settings for this model** in the top right.
-5. Click **Reload the Model** in the top right.
-6. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
 ## Provided files

 ## How to easily download and use this model in text-generation-webui
+Please make sure you're using the latest version of text-generation-webui
 1. Click the **Model tab**.
 2. Under **Download custom model or LoRA**, enter `TheBloke/Nous-Hermes-13B-GPTQ`.
 3. Click **Download**.
+4. The model will start downloading. Once it's finished it will say "Done"
+5. In the top left, click the refresh icon next to **Model**.
+6. In the **Model** dropdown, choose the model you just downloaded: `Nous-Hermes-13B-GPTQ`
+7. The model will automatically load, and is now ready for use!
+8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
+  * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
+9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
+## How to use this GPTQ model from Python code
+First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
+`pip install auto-gptq`
+Then try the following example code:
+```python
+from transformers import AutoTokenizer, pipeline, logging
+from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
+import argparse
+model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
+model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"
+use_triton = False
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
+model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
+        model_basename=model_basename,
+        use_safetensors=True,
+        trust_remote_code=True,
+        device="cuda:0",
+        use_triton=use_triton,
+        quantize_config=None)
+print("\n\n*** Generate:")
+input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
+output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
+print(tokenizer.decode(output[0]))
+# Inference can also be done using transformers' pipeline
+# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
+logging.set_verbosity(logging.CRITICAL)
+prompt = "Tell me about AI"
+prompt_template=f'''### Human: {prompt}
+### Assistant:'''
+print("*** Pipeline:")
+pipe = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer,
+    max_new_tokens=512,
+    temperature=0.7,
+    top_p=0.95,
+    repetition_penalty=1.15
+)
+print(pipe(prompt_template)[0]['generated_text'])
+```
 ## Provided files