TheBloke commited on
Commit
2676a52
1 Parent(s): 5b8b77a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -3
README.md CHANGED
@@ -29,6 +29,15 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
29
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML)
30
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/lmsys/vicuna-7b-v1.3)
31
 
 
 
 
 
 
 
 
 
 
32
  ## How to easily download and use this model in text-generation-webui
33
 
34
  Please make sure you're using the latest version of text-generation-webui
@@ -74,8 +83,8 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
74
 
75
  # Note: check the prompt template is correct for this model.
76
  prompt = "Tell me about AI"
77
- prompt_template=f'''### Human: {prompt}
78
- ### Assistant:'''
79
 
80
  print("\n\n*** Generate:")
81
 
@@ -106,12 +115,13 @@ print(pipe(prompt_template)[0]['generated_text'])
106
 
107
  **vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors**
108
 
109
- This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
110
 
111
  It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
112
 
113
  * `vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors`
114
  * Works with AutoGPTQ in CUDA or Triton modes.
 
115
  * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
116
  * Works with text-generation-webui, including one-click-installers.
117
  * Parameters: Groupsize = 128. Act Order / desc_act = False.
 
29
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/vicuna-7B-v1.3-GGML)
30
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/lmsys/vicuna-7b-v1.3)
31
 
32
+ ## Prompt template
33
+
34
+ ```
35
+ A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
36
+
37
+ USER: prompt
38
+ ASSISTANT:
39
+ ```
40
+
41
  ## How to easily download and use this model in text-generation-webui
42
 
43
  Please make sure you're using the latest version of text-generation-webui
 
83
 
84
  # Note: check the prompt template is correct for this model.
85
  prompt = "Tell me about AI"
86
+ prompt_template=f'''USER: {prompt}
87
+ ASSISTANT:'''
88
 
89
  print("\n\n*** Generate:")
90
 
 
115
 
116
  **vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors**
117
 
118
+ This will work with AutoGPTQ, ExLlama, and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
119
 
120
  It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
121
 
122
  * `vicuna-7b-v1.3-GPTQ-4bit-128g.no-act.order.safetensors`
123
  * Works with AutoGPTQ in CUDA or Triton modes.
124
+ * Works with ExLlama.
125
  * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
126
  * Works with text-generation-webui, including one-click-installers.
127
  * Parameters: Groupsize = 128. Act Order / desc_act = False.