casperhansen commited on
Commit
2870996
1 Parent(s): bccdca9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -1,3 +1,43 @@
1
  ---
2
  license: llama2
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
  ---
4
+
5
+ To use this model, you must have [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) installed.
6
+
7
+ ```
8
+ pip install autoawq
9
+ ```
10
+
11
+ Example generation with streaming:
12
+
13
+ ```python
14
+ from awq import AutoAWQForCausalLM
15
+ from transformers import AutoTokenizer, TextStreamer
16
+
17
+ quant_path = "casperhansen/vicuna-7b-v1.5-awq"
18
+ quant_file = "awq_model_w4_g128.pt"
19
+
20
+ # Load model
21
+ model = AutoAWQForCausalLM.from_quantized(quant_path, quant_file, fuse_layers=True)
22
+ tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
23
+ streamer = TextStreamer(tokenizer, skip_special_tokens=True)
24
+
25
+ # Convert prompt to tokens
26
+ prompt_template = """\
27
+ A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
28
+
29
+ USER: {prompt}
30
+ ASSISTANT:"""
31
+
32
+ tokens = tokenizer(
33
+ prompt_template.format(prompt="How are you today?"),
34
+ return_tensors='pt'
35
+ ).input_ids.cuda()
36
+
37
+ # Generate output
38
+ generation_output = model.generate(
39
+ tokens,
40
+ streamer=streamer,
41
+ max_new_tokens=512
42
+ )
43
+ ```