Loading model without fast-attn

by TZ20 - opened

Hi, if I set trust_remote_code = Falsewhen loading the model, will it just be the normal LlamaForCausalLM? If so, then running with 32K length would require too much computational power

Together org

Hi @TZ20 , thanks for your question! Yes, setting trust_remote_code=False will result in using the LlamaForCausalLM hardcoded in the huggingface library. Since this does not make use of flash attention, the speed will be lower and memory footprint higher.

