--- tags: - gptq - 4bit - int4 - gptqmodel - modelcloud - instruct - exaone --- This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel). - **bits**: 4 - **group_size**: 32 - **desc_act**: true - **static_groups**: false - **sym**: false - **lm_head**: false - **damp_percent**: 0.0025 - **damp_auto_increment**: 0.0015 - **true_sequential**: true - **model_name_or_path**: "" - **model_file_base_name**: "model" - **quant_method**: "gptq" - **checkpoint_format**: "gptq" - **meta**: - **quantizer**: "gptqmodel:0.9.11-dev0" ## Example: ```python from transformers import AutoTokenizer from gptqmodel import GPTQModel model_name = "ModelCloud/EXAONE-3.0-7.8B-Instruct-gptq-4bit" prompt = [ {"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."}, {"role": "user", "content": "I am in Shanghai, preparing to visit the natural history museum. Can you tell me the best way to"} ] tokenizer = AutoTokenizer.from_pretrained(model_name) model = GPTQModel.from_quantized(model_name, trust_remote_code=True) input_tensor = tokenizer.apply_chat_template(prompt, add_generation_prompt=True, return_tensors="pt") outputs = model.generate(input_ids=input_tensor.to(model.device), max_new_tokens=100) result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True) print(result) ```