Lauler/UL2-nemo-conversion

Checkpoints and conversion scripts for Nemo cpkt files to Huggingface

This repo contains two checkpoints (.ckpt files) for UL2 models we have started pretraining with Nemo. The checkpoints are found in nemo_checkpoints/. The Nemo config files used to train these models can be found in nemo_config/ul2-base-nl36.

megatron_ul2--val_loss=2.54-step=7000-consumed_samples=14557920.0.ckpt was trained with megatron_legacy: False in the config, whereas the other checkpoint was trained with megatron_legacy: True.

Nvidia have created a conversion script that converts T5, T5v1.1 and UL2 models on Huggingface Hub to Nemo format. The script can be found here. It is also included in this repo.

We thought that adapting a T5/UL2 model trained with Nemo to a Huggingface format would simply be a manner of reversing the conversion that was performed by the script above. Our conversion script does work assuming we operate directly on the pt state dict weight files produced by running the above Nvidia script. I.e. it works when going directly Huggingface -> Nemo -> Huggingface. However, it does not work when attempting to go Nemo -> Huggingface. An UL2 model that was initialized with Nemo Megatron, and pretrained with Nemo, does not produce same output when converted to Huggingface format.

Dependencies

We use Nemo docker containers (tag 23.02) via Singularity when running the code in this repo. We have included a definition file to build the container.

To build the container:

sudo singularity build nemo2302.sif nemo_singularity.def

We provide bash scripts to execute with singularity. However, to debug easier you can also run singularity in interactive mode via:

singularity shell --nv nemo2302.sif

Converting Nemo checkpoints to Huggingface

We have included our conversion script in this repo. It can be found in convert_nemo_ul2_checkpoint.py.

We manually created a Huggingface config file for UL2 that to the best of our knowledge matches the settings used when we trained with Nemo (see config_ul2_base_nl36.json).

To replicate our weights conversion, simply run:

singularity exec --nv nemo2302.sif bash convert_nemo_to_hf.sh

The resulting Huggingface model will be saved to ul2-base-nl36-swedish/.

We are aware that Megatron-LM uses different ordering of QKV in the attention layers depending on the version of Megatron-LM used. We are also aware of an existing conversion script that Huggingface have created for converting Megatron-BERT to Huggingface, where they adapt the ordering of QKV in Megatron to match the ordering used in Huggingface. As such we have an optional --fix_qkv parameter in our conversion script that applies the same reordering of QKV as Huggingface does. See the lines that are commented out in convert_nemo_to_hf.sh for an example of how to use this parameter and set the checkpoint_version.

Unfortunately, none of the above solves the issue we have with the conversion script.

We have a test script that predicts both with the original Nemo model and with the converted Huggingface model. The output unfortunately isn't the same. We used the same identical tokenizer for both models. To run:

singularity exec --nv nemo2302.sif python test_ul2_hf.py

Or explore in interactive mode with singularity shell --nv nemo2302.sif.

Confirming the conversion script can reverse Nvidia's conversion script

In order to confirm the conversion script is valid enough in the sense that it is able to reverse Nvidia's conversion script, we here include instructions to convert a UL2 model from Huggingface to Nemo, via Nvidia's conversion script, and then back to Huggingface via our conversion script.

Instructions:

Run singularity exec --nv nemo2302.sif bash convert_hf_to_nemo.sh to convert the existing Finnish-NLP/ul2-base-nl36-finnish from Huggingface to Nemo format via Nvidia's conversion script. The resultning model weights will be saved to the folder ul2-base-nl36-finnish/.
To perform the reverse conversion, and to perform a check whether the re-converted weights are identical, run python convert_finnish_ul2_model.py. Or via singularity: singularity exec --nv nemo2302.sif python convert_finnish_ul2_model.py.

The resuling model re-converted to Huggingface will be found in ul2-base-nl36-finnish/hf_t5_ul2.

This conversion produces a model that is identical to the original model.