|
--- |
|
pipeline_tag: text-generation |
|
inference: true |
|
widget: |
|
- text: 'Hello!' |
|
example_title: Hello world |
|
group: Python |
|
library_name: transformers |
|
--- |
|
|
|
# yujiepan/falcon-40b-awq-w4g128 |
|
|
|
This model applies autoawq on [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b): AutoAWQ, 4bit, group_size=128, zero_point=True |
|
|
|
|
|
## Accuracy |
|
|
|
| task | tiiuae/falcon-40b (fp16) | this repo | |
|
|----------------------------|-------------------|-----------| |
|
| wikitext ppl by lm_harness | 8.410 | 8.497 | |
|
|
|
|
|
|
|
## Usage |
|
|
|
```python |
|
from awq import AutoAWQForCausalLM |
|
from transformers import AutoTokenizer |
|
|
|
model_name_or_path = "yujiepan/falcon-40b-awq-w4g128" |
|
|
|
# Load model |
|
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=False, trust_remote_code=False) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) |
|
|
|
prompt = "Tell me about AI" |
|
tokens = tokenizer( |
|
prompt, |
|
return_tensors='pt' |
|
).input_ids.cuda() |
|
|
|
# Generate |
|
generation_output = model.generate( |
|
tokens, |
|
do_sample=True, |
|
temperature=0.7, |
|
top_p=0.95, |
|
top_k=40, |
|
max_new_tokens=10, |
|
) |
|
|
|
print("Output: ", tokenizer.decode(generation_output[0])) |
|
``` |