yujiepan/falcon-40b-awq-w4g128

This model applies autoawq on tiiuae/falcon-40b: AutoAWQ, 4bit, group_size=128, zero_point=True

Accuracy

task tiiuae/falcon-40b (fp16) this repo
wikitext ppl by lm_harness 8.410 8.497

Usage

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_name_or_path = "yujiepan/falcon-40b-awq-w4g128"

# Load model
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=False, trust_remote_code=False)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

prompt = "Tell me about AI"
tokens = tokenizer(
    prompt,
    return_tensors='pt'
).input_ids.cuda()

# Generate
generation_output = model.generate(
    tokens,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    max_new_tokens=10,
)

print("Output: ", tokenizer.decode(generation_output[0]))
Downloads last month
16
Safetensors
Model size
6.52B params
Tensor type
I32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.