OxxoCodes/Meta-Llama-3-70B-Instruct-GPTQ

Built with Meta Llama 3

Model Description

This is a 4-bit GPTQ quantized version of meta-llama/Meta-Llama-3-8B-Instruct.

This model was quantized using the following quantization config:

quantize_config = BaseQuantizeConfig(
    bits=4,
    group_size=128,
    desc_act=False,
    damp_percent=0.1,
)

To use this model, you need to install AutoGPTQ. For detailed installation instructions, please refer to the AutoGPTQ GitHub repository.

Example Usage

from auto_gptq import AutoGPTQForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-70B-Instruct")
model = AutoGPTQForCausalLM.from_quantized("OxxoCodes/Meta-Llama-3-70B-Instruct-GPTQ")

output = model.generate(**tokenizer("The capitol of France is", return_tensors="pt").to(model.device))[0]
print(tokenizer.decode(output))