metadata
base_model: facebook/opt-13b
language:
- en
license: other
model_name: opt-13b
pipeline_tag: text-generation
inference: false
model_creator: facebook
model_type: opt
quantized_by: iproskurina
tags:
- gptq
- 4-bit
base_model_relation: quantized
OPT-13B - GPTQ
The model published in this repo was quantized to 4bit using AutoGPTQ.
Quantization details
All quantization parameters were taken from GPTQ paper.
GPTQ calibration data consisted of 128 random 2048 token segments from the C4 dataset.
The grouping size used for quantization is equal to 128.
How to use this GPTQ model from Python code
Install the necessary packages
pip install accelerate==0.26.1 datasets==2.16.1 dill==0.3.7 gekko==1.0.6 multiprocess==0.70.15 peft==0.7.1 rouge==1.0.1 sentencepiece==0.1.99
git clone https://github.com/upunaprosk/AutoGPTQ
cd AutoGPTQ
pip install -v .
Recommended transformers version: 4.35.2.
You can then use the following code
from transformers import AutoTokenizer, TextGenerationPipeline,AutoModelForCausalLM
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
pretrained_model_dir = "iproskurina/opt-13b-gptq-4bit"
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(pretrained_model_dir, device="cuda:0", model_basename="model")
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)
print(pipeline("auto-gptq is")[0]["generated_text"])