--- library_name: transformers tags: [] --- # yujiepan/Meta-Llama-3-8B-Instruct-gptq-w8asym This model applies AutoGPTQ on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct). - 8-bit asymmetric weight only quantization - group_size=-1 - calibration set: c4-new ## Accuracy | model | precision | wikitext ppl (↓) | |-|-|-| | meta-llama/Meta-Llama-3-8B-Instruct | FP16 | 10.842 | | yujiepan/Meta-Llama-3-8B-Instruct-gptq-w8asym | int8-asym | 10.854 | Note: - Evaluated on lm-evaluation-harness "wikitext" task - Wikitext PPL does not guarantee actual accuracy, but helps to check the disortion after quantization. ## Usage ```python from awq import AutoAWQForCausalLM model = AutoAWQForCausalLM.from_quantized('yujiepan/Meta-Llama-3-8B-Instruct-gptq-w8asym') ``` ## Codes ```python import os import torch from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig model_id = "meta-llama/Meta-Llama-3-8B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_id) quantization_config = GPTQConfig( bits=8, group_size=-1, dataset="c4-new", sym=False, tokenizer=tokenizer, use_cuda_fp16=True, ) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", low_cpu_mem_usage=True, quantization_config=quantization_config, ) ```