File size: 1,178 Bytes
927ea0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
pipeline_tag: text-generation
inference: true
widget:
- text: 'Hello!'
  example_title: Hello world
  group: Python
library_name: transformers
---

# yujiepan/falcon-40b-awq-w4g128

This model applies autoawq on [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b): AutoAWQ, 4bit, group_size=128, zero_point=True


## Accuracy

| task                       | tiiuae/falcon-40b (fp16) | this repo |
|----------------------------|-------------------|-----------|
| wikitext ppl by lm_harness | 8.410 | 8.497  |



## Usage

```python
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_name_or_path = "yujiepan/falcon-40b-awq-w4g128"

# Load model
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=False, trust_remote_code=False)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

prompt = "Tell me about AI"
tokens = tokenizer(
    prompt,
    return_tensors='pt'
).input_ids.cuda()

# Generate
generation_output = model.generate(
    tokens,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    max_new_tokens=10,
)

print("Output: ", tokenizer.decode(generation_output[0]))
```