Edit model card

yujiepan/falcon-40b-awq-w4g128

This model applies autoawq on tiiuae/falcon-40b: AutoAWQ, 4bit, group_size=128, zero_point=True

Accuracy

task tiiuae/falcon-40b (fp16) this repo
wikitext ppl by lm_harness 8.410 8.497

Usage

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_name_or_path = "yujiepan/falcon-40b-awq-w4g128"

# Load model
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=False, trust_remote_code=False)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

prompt = "Tell me about AI"
tokens = tokenizer(
    prompt,
    return_tensors='pt'
).input_ids.cuda()

# Generate
generation_output = model.generate(
    tokens,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    max_new_tokens=10,
)

print("Output: ", tokenizer.decode(generation_output[0]))
Downloads last month
6
Safetensors
Model size
6.52B params
Tensor type
I32
·
FP16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for yujiepan/falcon-40b-awq-w4g128