Why need get_mup_param_groups instead of default one in Huggingface?

#18
by sanqiang - opened

I think it is from Tensor Programs V paper.
Just curious how to train it: Deepspeed with custom optimizer will raise exception.

Sign up or log in to comment