When will gradient checkpointing be implemented?

#68
by rishiraj - opened

Please give an idea on when can we expect gradient checkpointing to be implemented? Without it, it becomes very hard to finetune it.

Also Flash Attention 2!

And I wonder why the code has checkpointing elements, while support_gradient_checkpoint remaines False

Microsoft org
edited Jan 9

it seems to be implemented within Axolotl by Winglian on Github, not sure if it can be reused as is here.

https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/models/phi/modeling_phi.py

Using this library for FT & RL of Phi-2

Microsoft org

Hello everyone!

We have an ongoing PR in https://github.com/huggingface/transformers/pull/28163 which will solve this issue.

Regards,
Gustavo.

gugarosa changed discussion status to closed

Sign up or log in to comment