@mayank-mishra on Hugging Face: "I have just published my first blog post. While FlashAttention has been…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

mayank-mishra

posted an update Mar 9, 2024

Post

I have just published my first blog post.

While FlashAttention has been readily integrated into HuggingFace transformers, there are much higher gains to be had (at least theoretically) for finetuning models with examples of variable sequence lengths in a batch.

For a deeper dive, please go through my blog at https://huggingface.co/blog/mayank-mishra/padding-free-transformer.

osanseviero

Mar 9, 2024

Interesting! @joaogante and @tomaarsen and @olivierdehaene might be interested in this too!

olivierdehaene

Mar 9, 2024

Nice blog!
@osanseviero we have been doing this in TGI and TEI for a while ;)
Padding free implementations also make dynamic batching easier to implement and more predictable in memory.

mayank-mishra

Mar 9, 2024

yeah, its just that people have not been using this for finetuning where it can give considerable memory savings. I guess the issue is the core design of HF transformers.

I am planning to release the code for this sometime soon :)