inference example

#4
by rrkotik - opened

Hello, can you provide how to run inference for it?

i tried something like this:

model = transformers.LlamaForCausalLM.from_pretrained("kuleshov/llama-7b-4bit", load_in_8bit=True, device_map='auto')

I receive error:

ValueError: weight is on the meta device, we need a `value` to put in on 0.

The lines of code run well in the env, But I am confused that its gpu memory usage is about to 8GB, which is the same with llama-7b-int8 model.

Sign up or log in to comment