Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

flash_attn on gpu

#20
by uglydumpling - opened

Can we run this model without using flash_attn on GPU?

Mosaic ML, Inc. org

Yes you can! Just use attn_impl: torch.

You can do this by editing the config.json directly or by following the instructions in the README:

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b',
  trust_remote_code=True
)
config.attn_config.attn_impl = 'torch' # it should already be 'torch' but just for clarity

model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b',
  config=config,
  torch_dtype=torch.bfloat16,
  trust_remote_code=True
)
model.to(device='cuda:0')
abhi-mosaic changed discussion status to closed

Sign up or log in to comment