EMBO
/

BioMegatron345mCased

Inference Endpoints

Model card Files Files and versions Community

Dr. Jorge Abreu Vicente commited on May 31, 2022

Commit

6e8f412

•

1 Parent(s): 9e54323

Update README.md

Files changed (1) hide show

README.md +37 -0

README.md CHANGED Viewed

@@ -26,6 +26,43 @@ In this implementation we have followed the commands of the [`nvidia/megatron-be
 However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py) needed a modification. The reason is that the Megatron model shown in [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m) has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.
 We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
 BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m).

 However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py) needed a modification. The reason is that the Megatron model shown in [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m) has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.
+The code below is a modification of the original [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py).
+```python
+import os
+import torch
+from convert_biomegatron_checkpoint import convert_megatron_checkpoint
+print_checkpoint_structure = True
+path_to_checkpoint = "/path/to/BioMegatron345mUncased/"
+# Extract the basename.
+basename = os.path.dirname(path_to_checkpoint).split('/')[-1]
+# Load the model.
+input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu")
+# Convert.
+print("Converting")
+output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False)
+# Print the structure of converted state dict.
+if print_checkpoint_structure:
+    recursive_print(None, output_state_dict)
+# Store the config to file.
+output_config_file = os.path.join(path_to_checkpoint, "config.json")
+print(f'Saving config to "{output_config_file}"')
+with open(output_config_file, "w") as f:
+    json.dump(output_config, f)
+# Store the state_dict to file.
+output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin")
+print(f'Saving checkpoint to "{output_checkpoint_file}"')
+torch.save(output_state_dict, output_checkpoint_file)
+```
 We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
 BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m).