elinas
/

vicuna-13b-4bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

elinas commited on Apr 4, 2023

Commit

29d9840

•

1 Parent(s): 4f6fdcd

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -6,9 +6,11 @@ tags:
 ---
 # vicuna-13b-4bit
-Converted `vicuna-13b` to GPTQ 4bit using `true-sequentual` and `groupsize 128` in `safetensors` for best possible model performance.
-https://github.com/qwopqwop200/GPTQ-for-LLaMa
 # Update 2023-04-03
 Recent GPTQ commits have introduced breaking changes to model loading and you should use commit `a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773` in the `cuda` branch.
@@ -27,6 +29,7 @@ This creates and switches to a `cuda-stable` branch to continue using the quanti
 Since this is instruction tuned, for best results, use the following format for inference (note that the instruction format is different from Alpaca):
 ```
 ### Human: your-prompt
 ```
 If you want deterministic results, turn off sampling. You can turn it off in the webui by unchecking `do_sample`.

 ---
 # vicuna-13b-4bit
+Converted `vicuna-13b` to GPTQ 4bit using `true-sequentual` and `groupsize 128` in `safetensors` for best possible model performance.
+Vicuna is a high coherence model based on Llama that is comparable to ChatGPT. Read more here https://vicuna.lmsys.org/
+GPTQ - https://github.com/qwopqwop200/GPTQ-for-LLaMa
 # Update 2023-04-03
 Recent GPTQ commits have introduced breaking changes to model loading and you should use commit `a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773` in the `cuda` branch.
 Since this is instruction tuned, for best results, use the following format for inference (note that the instruction format is different from Alpaca):
 ```
 ### Human: your-prompt
+### Assistant:
 ```
 If you want deterministic results, turn off sampling. You can turn it off in the webui by unchecking `do_sample`.