Dr. Jorge Abreu Vicente commited on
Commit
6e8f412
1 Parent(s): 9e54323

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md CHANGED
@@ -26,6 +26,43 @@ In this implementation we have followed the commands of the [`nvidia/megatron-be
26
 
27
  However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py) needed a modification. The reason is that the Megatron model shown in [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m) has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
30
 
31
  BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m).
 
26
 
27
  However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py) needed a modification. The reason is that the Megatron model shown in [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m) has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.
28
 
29
+ The code below is a modification of the original [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py).
30
+
31
+ ```python
32
+ import os
33
+ import torch
34
+ from convert_biomegatron_checkpoint import convert_megatron_checkpoint
35
+
36
+ print_checkpoint_structure = True
37
+ path_to_checkpoint = "/path/to/BioMegatron345mUncased/"
38
+
39
+ # Extract the basename.
40
+ basename = os.path.dirname(path_to_checkpoint).split('/')[-1]
41
+
42
+ # Load the model.
43
+ input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu")
44
+
45
+ # Convert.
46
+ print("Converting")
47
+ output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False)
48
+
49
+ # Print the structure of converted state dict.
50
+ if print_checkpoint_structure:
51
+ recursive_print(None, output_state_dict)
52
+
53
+ # Store the config to file.
54
+ output_config_file = os.path.join(path_to_checkpoint, "config.json")
55
+ print(f'Saving config to "{output_config_file}"')
56
+ with open(output_config_file, "w") as f:
57
+ json.dump(output_config, f)
58
+
59
+ # Store the state_dict to file.
60
+ output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin")
61
+ print(f'Saving checkpoint to "{output_checkpoint_file}"')
62
+ torch.save(output_state_dict, output_checkpoint_file)
63
+
64
+ ```
65
+
66
  We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
67
 
68
  BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m).