Loading model without fast-attn

#10
by TZ20 - opened

Hi, if I set trust_remote_code = Falsewhen loading the model, will it just be the normal LlamaForCausalLM? If so, then running with 32K length would require too much computational power

Together org

Hi @TZ20 , thanks for your question! Yes, setting trust_remote_code=False will result in using the LlamaForCausalLM hardcoded in the huggingface library. Since this does not make use of flash attention, the speed will be lower and memory footprint higher.

Sign up or log in to comment