Update README.md
Browse files
README.md
CHANGED
@@ -85,10 +85,11 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
|
85 |
quantize_config=None)
|
86 |
|
87 |
prompt = "Tell me about AI"
|
88 |
-
prompt_template=f'''
|
89 |
|
90 |
-
|
91 |
-
|
|
|
92 |
|
93 |
print("\n\n*** Generate:")
|
94 |
|
@@ -128,6 +129,7 @@ It was created without group_size to lower VRAM requirements, and with --act-ord
|
|
128 |
* `chronoboros-33b-GPTQ-4bit--1g.act.order.safetensors`
|
129 |
* Works with [ExLlama](https://github.com/turboderp/exllama), providing the best performance and lowest VRAM usage. Recommended.
|
130 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
|
|
131 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
132 |
* Works with text-generation-webui, including one-click-installers.
|
133 |
* Parameters: Groupsize = -1. Act Order / desc_act = True.
|
|
|
85 |
quantize_config=None)
|
86 |
|
87 |
prompt = "Tell me about AI"
|
88 |
+
prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
89 |
|
90 |
+
### Instruction: {prompt}
|
91 |
+
|
92 |
+
### Response:```
|
93 |
|
94 |
print("\n\n*** Generate:")
|
95 |
|
|
|
129 |
* `chronoboros-33b-GPTQ-4bit--1g.act.order.safetensors`
|
130 |
* Works with [ExLlama](https://github.com/turboderp/exllama), providing the best performance and lowest VRAM usage. Recommended.
|
131 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
132 |
+
* Works with [Occ4m's GPTQ-for-LLaMa fork](https://github.com/0cc4m/GPTQ-for-LLaMa).
|
133 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
134 |
* Works with text-generation-webui, including one-click-installers.
|
135 |
* Parameters: Groupsize = -1. Act Order / desc_act = True.
|