TheBloke commited on
Commit
590a500
1 Parent(s): 13b7979

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -85,10 +85,11 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
85
  quantize_config=None)
86
 
87
  prompt = "Tell me about AI"
88
- prompt_template=f'''A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
89
 
90
- USER: {prompt}
91
- ASSISTANT:'''
 
92
 
93
  print("\n\n*** Generate:")
94
 
@@ -128,6 +129,7 @@ It was created without group_size to lower VRAM requirements, and with --act-ord
128
  * `chronoboros-33b-GPTQ-4bit--1g.act.order.safetensors`
129
  * Works with [ExLlama](https://github.com/turboderp/exllama), providing the best performance and lowest VRAM usage. Recommended.
130
  * Works with AutoGPTQ in CUDA or Triton modes.
 
131
  * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
132
  * Works with text-generation-webui, including one-click-installers.
133
  * Parameters: Groupsize = -1. Act Order / desc_act = True.
 
85
  quantize_config=None)
86
 
87
  prompt = "Tell me about AI"
88
+ prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.
89
 
90
+ ### Instruction: {prompt}
91
+
92
+ ### Response:```
93
 
94
  print("\n\n*** Generate:")
95
 
 
129
  * `chronoboros-33b-GPTQ-4bit--1g.act.order.safetensors`
130
  * Works with [ExLlama](https://github.com/turboderp/exllama), providing the best performance and lowest VRAM usage. Recommended.
131
  * Works with AutoGPTQ in CUDA or Triton modes.
132
+ * Works with [Occ4m's GPTQ-for-LLaMa fork](https://github.com/0cc4m/GPTQ-for-LLaMa).
133
  * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
134
  * Works with text-generation-webui, including one-click-installers.
135
  * Parameters: Groupsize = -1. Act Order / desc_act = True.