|
Migrating from Megatron-LM |
|
|
|
|
|
NeMo Megatron and Megatron-LM share many underlying technology. You should be able to convert your GPT model checkpoints trained with Megatron-LM into NeMo Megatron. |
|
Example conversion script: |
|
|
|
.. code-block:: bash |
|
|
|
<NeMo_ROOT_FOLDER>/examples/nlp/language_modeling/megatron_lm_ckpt_to_nemo.py \ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To resume the training from converted MegatronLM checkpoint, make sure to set the |
|
`trainer.max_steps=round(lr-warmup-fraction * lr-decay-iters + lr-decay-iters)` |
|
where `lr-warmup-fraction` and `lr-decay-iters` are arguments from MegatronLM training |
|
so the learning rate scheduler will follow the same curve. |
|
|
|
|