Edit model card

Compressed LLM Model Zone

The models are prepared by Visual Informatics Group @ University of Texas at Austin (VITA-group).

License: MIT License

Setup environment

pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
pip install transformers==4.31.0
pip install accelerate
pip install auto-gptq  # for gptq

How to use pruned models

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = 'llama-2-7b'
comp_method = 'magnitude_unstructured'
comp_degree = 0.2
model_path = f'vita-group/{base_model}_{comp_method}'
model = AutoModelForCausalLM.from_pretrained(
        model_path, 
        revision=f's{comp_degree}',
        torch_dtype=torch.float16, 
        low_cpu_mem_usage=True, 
        device_map="auto"
    )
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.cuda()
outputs = model.generate(input_ids, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))

How to use quantized models

from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_path = 'vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g'
model = AutoGPTQForCausalLM.from_quantized(
        model_path,
        # inject_fused_attention=False, # or 
        disable_exllama=True,
        device_map='auto',
    )
Base Model Model Size Compression Method Compression Degree
0 Llama-2 7b magnitude_unstructured s0.1
1 Llama-2 7b magnitude_unstructured s0.2
2 Llama-2 7b magnitude_unstructured s0.3
3 Llama-2 7b magnitude_unstructured s0.5
4 Llama-2 7b magnitude_unstructured s0.6
5 Llama-2 7b sparsegpt_unstructured s0.1
6 Llama-2 7b sparsegpt_unstructured s0.2
7 Llama-2 7b sparsegpt_unstructured s0.3
8 Llama-2 7b sparsegpt_unstructured s0.5
9 Llama-2 7b sparsegpt_unstructured s0.6
10 Llama-2 7b wanda_unstructured s0.1
11 Llama-2 7b wanda_unstructured s0.2
12 Llama-2 7b wanda_unstructured s0.3
13 Llama-2 7b wanda_unstructured s0.5
14 Llama-2 7b wanda_unstructured s0.6
Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.