Llama 2 7B quantized in 2-bit with GPTQ.

from transformers import AutoModelForCausalLM, AutoTokenizer
from optimum.gptq import GPTQQuantizer
import torch
w = 2
model_path = meta-llama/Llama-2-7b-hf

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
quantizer = GPTQQuantizer(bits=w, dataset="c4", model_seqlen = 4096)
quantized_model = quantizer.quantize_model(model, tokenizer)

Downloads last month: 546

Safetensors

Model size

723M params

Tensor type

I32

FP16

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including kaitchup/Llama-2-7b-hf-gptq-2bit

GPTQ

Collection

Llama 2 7B, 13B, Llama 3 8B, and Mistral 7B quantized with GPTQ in 2-bit, 3-bit, 4-bit and 8-bit with GPTQ. • 16 items • Updated May 6, 2024