Dr. Jorge Abreu Vicente
commited on
Commit
•
6e8f412
1
Parent(s):
9e54323
Update README.md
Browse files
README.md
CHANGED
@@ -26,6 +26,43 @@ In this implementation we have followed the commands of the [`nvidia/megatron-be
|
|
26 |
|
27 |
However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py) needed a modification. The reason is that the Megatron model shown in [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m) has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
|
30 |
|
31 |
BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m).
|
|
|
26 |
|
27 |
However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py) needed a modification. The reason is that the Megatron model shown in [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m) has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.
|
28 |
|
29 |
+
The code below is a modification of the original [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py).
|
30 |
+
|
31 |
+
```python
|
32 |
+
import os
|
33 |
+
import torch
|
34 |
+
from convert_biomegatron_checkpoint import convert_megatron_checkpoint
|
35 |
+
|
36 |
+
print_checkpoint_structure = True
|
37 |
+
path_to_checkpoint = "/path/to/BioMegatron345mUncased/"
|
38 |
+
|
39 |
+
# Extract the basename.
|
40 |
+
basename = os.path.dirname(path_to_checkpoint).split('/')[-1]
|
41 |
+
|
42 |
+
# Load the model.
|
43 |
+
input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu")
|
44 |
+
|
45 |
+
# Convert.
|
46 |
+
print("Converting")
|
47 |
+
output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False)
|
48 |
+
|
49 |
+
# Print the structure of converted state dict.
|
50 |
+
if print_checkpoint_structure:
|
51 |
+
recursive_print(None, output_state_dict)
|
52 |
+
|
53 |
+
# Store the config to file.
|
54 |
+
output_config_file = os.path.join(path_to_checkpoint, "config.json")
|
55 |
+
print(f'Saving config to "{output_config_file}"')
|
56 |
+
with open(output_config_file, "w") as f:
|
57 |
+
json.dump(output_config, f)
|
58 |
+
|
59 |
+
# Store the state_dict to file.
|
60 |
+
output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin")
|
61 |
+
print(f'Saving checkpoint to "{output_checkpoint_file}"')
|
62 |
+
torch.save(output_state_dict, output_checkpoint_file)
|
63 |
+
|
64 |
+
```
|
65 |
+
|
66 |
We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
|
67 |
|
68 |
BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m).
|