load in 4/8bits extremely slow

#54

by obiwan92 - opened Apr 23

Apr 23

try to load model using AutoModelForCausalLM.from_pretrain. When load in 8bits or 4bits are on, loading is very slow and generate function hangs. But with bf16 it is just fine, any idea ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment