try to load model using AutoModelForCausalLM.from_pretrain. When load in 8bits or 4bits are on, loading is very slow and generate function hangs. But with bf16 it is just fine, any idea ?
· Sign up or log in to comment