When can we have the training code as illustrated in the paper.

#5
by Shamane - opened

Amazing work! Would love to have the training code.

JetMoE org

Please check our modified megablocks: https://github.com/yikangshen/megablocks. To reproduce the training code, you only need to integrate it with the Megatron codebase.

Thanks a lot @YikangS .

Could you please explain the following statement ?

β€œ To reproduce the training code, you only need to integrate it with the Megatron codebase.”

Is there a standard way to integrate above mentioned training code with the Megatron?

Sorry I am new to Megatrone and Megablocks.

JetMoE org

ya, the original Megablocks repo provides an example of integrating Megablocks with Megatron.

JetMoE org

You can also integrate this Megablocks repo into any pretraining framework you prefer.

Thanks a lot

@YikangS , just to clarify:

This is the example related to Megatron right?
https://github.com/databricks/megablocks/tree/main/exp/moe

@YikangS I went through the https://github.com/yikangshen/megablocks repositary. Could you please explain how we can add the https://github.com/myshell-ai/JetMoE model to the megabucks?

Is there a pre-training script in the modified Megablocks repository? If so please share the link.

In the JetMoE technical report, there are several key settings related to model pre-training (Section 4.1). How can we set this stuff in the modified Megablocks repository?

Do you think we can use the same method to fine-tune JetMOE as well?

+1. Thanks for the amazing work! Can you please share the Megatron integration codes to either pretrain or finetune JetMoE using MegaBlocks? This would be so helpful.

JetMoE org

Sure, it will take some time to clean the code. I will release the full training code in 1 week.

@YikangS Thanks a lot.

Great job @YikangS , are there any plans to open source the training mixture so that people can follow and compare?

Sign up or log in to comment