load in 4/8bits extremely slow

#54
by obiwan92 - opened

try to load model using AutoModelForCausalLM.from_pretrain. When load in 8bits or 4bits are on, loading is very slow and generate function hangs. But with bf16 it is just fine, any idea ?

Sign up or log in to comment