how to load and use model?

by Q4234 - opened Dec 9, 2022

Dec 9, 2022

I tried

from transformers import AutoModelForCausalLM, AutoTokenizer
# from https://huggingface.co/facebook/opt-30b

modelName = "mit-han-lab/opt-13b-smoothquant" # 8-bit quantized model

model = AutoModelForCausalLM.from_pretrained(modelName, torch_dtype=torch.int8).cuda()
tokenizer = AutoTokenizer.from_pretrained(modelName, use_fast=False)

but this doesn't work...

Guangxuan-Xiao

MIT HAN Lab org Dec 10, 2022

Hi, please refer to https://github.com/mit-han-lab/smoothquant#smoothquant-int8-inference-for-pytorch to see how to use those models. We haven't integrated our INT8 kernels into huggingface.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment