Dr. Jorge Abreu Vicente commited on
Commit
561cc87
1 Parent(s): eb7f17a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -0
README.md CHANGED
@@ -28,6 +28,44 @@ However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/hug
28
 
29
  We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-uncased-345m).
32
 
33
  ```python
 
28
 
29
  We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
30
 
31
+
32
+ The code below is a modification of the original [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py).
33
+
34
+ ```python
35
+ import os
36
+ import torch
37
+ from convert_biomegatron_checkpoint import convert_megatron_checkpoint
38
+
39
+ print_checkpoint_structure = True
40
+ path_to_checkpoint = "/path/to/BioMegatron345mUncased/"
41
+
42
+ # Extract the basename.
43
+ basename = os.path.dirname(path_to_checkpoint).split('/')[-1]
44
+
45
+ # Load the model.
46
+ input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu")
47
+
48
+ # Convert.
49
+ print("Converting")
50
+ output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False)
51
+
52
+ # Print the structure of converted state dict.
53
+ if print_checkpoint_structure:
54
+ recursive_print(None, output_state_dict)
55
+
56
+ # Store the config to file.
57
+ output_config_file = os.path.join(path_to_checkpoint, "config.json")
58
+ print(f'Saving config to "{output_config_file}"')
59
+ with open(output_config_file, "w") as f:
60
+ json.dump(output_config, f)
61
+
62
+ # Store the state_dict to file.
63
+ output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin")
64
+ print(f'Saving checkpoint to "{output_checkpoint_file}"')
65
+ torch.save(output_state_dict, output_checkpoint_file)
66
+
67
+ ```
68
+
69
  BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-uncased-345m).
70
 
71
  ```python