``` GPTQConfig(bits=4, dataset="c4", tokenizer=tokenizer, sym=False, desc_act=True, group_size=32) ``` Run inference ``` from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig import time model_id = "meta-llama/Llama-3.2-1B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained("gptq-llama-3.2-1B-Instruct") prompt = "write me a 100-word essay on the topic of the history of the United States" print(tokenizer.decode(model.generate(**tokenizer(prompt, return_tensors="pt", max_length=512).to(model.device))[0])) ```