casperhansen
/

vicuna-7b-v1.5-awq

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

casperhansen commited on Sep 4, 2023

Commit

2870996

•

1 Parent(s): bccdca9

Update README.md

Files changed (1) hide show

README.md +40 -0

README.md CHANGED Viewed

@@ -1,3 +1,43 @@
 ---
 license: llama2
 ---

 ---
 license: llama2
 ---
+To use this model, you must have [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) installed.
+```
+pip install autoawq
+```
+Example generation with streaming:
+```python
+from awq import AutoAWQForCausalLM
+from transformers import AutoTokenizer, TextStreamer
+quant_path = "casperhansen/vicuna-7b-v1.5-awq"
+quant_file = "awq_model_w4_g128.pt"
+# Load model
+model = AutoAWQForCausalLM.from_quantized(quant_path, quant_file, fuse_layers=True)
+tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
+streamer = TextStreamer(tokenizer, skip_special_tokens=True)
+# Convert prompt to tokens
+prompt_template = """\
+A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
+USER: {prompt}
+ASSISTANT:"""
+tokens = tokenizer(
+    prompt_template.format(prompt="How are you today?"),
+    return_tensors='pt'
+).input_ids.cuda()
+# Generate output
+generation_output = model.generate(
+    tokens,
+    streamer=streamer,
+    max_new_tokens=512
+)
+```