optimize pipeline between device fwd and host bwd

#1
by zhaifly - opened
No description provided.
Habana AI org

Hi @zhaifly ! Could you describe a bit what you mean?
The title of your PR makes me think of a change in modeling, which should take place in Optimum Habana.

zhaifly changed pull request status to closed

Hi @zhaifly ! Could you describe a bit what you mean?
The title of your PR makes me think of a change in modeling, which should take place in Optimum Habana.

hi @regisss , I'm now optimizing the habana models ViT, Swin, GPT2, GPT-J and Neox, I want to add more HABANA specific command line args, and which also should be added into gaudi_config in each of model.
This PR is means: add a mark_step between the model.forward and loss.backward for better performance when doing training(pipelining host BWD and device FWD). There are lots of code changes in my local side for the 5 models. so I will temporary close this PR and reopen after the code ready.

Habana AI org

@zhaifly Sounds good!

A few recommendations:

  • To add a mark_step between the forward and backward methods, the best is to override the training_step method in the GaudiTrainer class (unless it is specific to the models you mentioned).
  • To add more Habana-specific args, we should first modify the GaudiConfig class. And then we can update the gaudi_config.json here.

Do not hesitate to ping me and open PRs on Github when you start working on it :)

@regisss appreciate your suggestions and will follow your comments to clean my code.

Sign up or log in to comment