how to load and use model?
#1
by
Q4234
- opened
I tried
from transformers import AutoModelForCausalLM, AutoTokenizer
# from https://huggingface.co/facebook/opt-30b
modelName = "mit-han-lab/opt-13b-smoothquant" # 8-bit quantized model
model = AutoModelForCausalLM.from_pretrained(modelName, torch_dtype=torch.int8).cuda()
tokenizer = AutoTokenizer.from_pretrained(modelName, use_fast=False)
but this doesn't work...
Hi, please refer to https://github.com/mit-han-lab/smoothquant#smoothquant-int8-inference-for-pytorch to see how to use those models. We haven't integrated our INT8 kernels into huggingface.