Edit model card
YAML Metadata Error: "language[0]" with value "english" is not valid. It must be an ISO 639-1, 639-2 or 639-3 code (two/three letters), or a special value like "code", "multilingual". If you want to use BCP-47 identifiers, you can specify them in language_bcp47.


This model has been uploaded to HuggingFace by https://huggingface.co/drAbreu

The model is based on the NVIDIA checkpoint located at



BioMegatron is a transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model trained on top of the Megatron-LM model, adding a PubMed corpusto the Megatron-LM corpora(Wikipedia, RealNews, OpenWebText, and CC-Stories). BioMegatron follows a similar (albeit not identical) architecture as BERT and it has 345 million parameters:

  • 24 layers
  • 16 attention heads with a hidden size of 1024.

More information available at nVIDIA NGC CATALOG

Running BioMegatron in 🤗 transformers

In this implementation we have followed the commands of the nvidia/megatron-bert-uncased-345m repository to make BioMegatron available in 🤗.

However, the file convert_megatron_bert_checkpoint.py needed a modification. The reason is that the Megatron model shown in nvidia/megatron-bert-uncased-345m has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.

The code below is a modification of the original convert_megatron_bert_checkpoint.py.

import os
import torch
from convert_biomegatron_checkpoint import convert_megatron_checkpoint

print_checkpoint_structure = True
path_to_checkpoint = "/path/to/BioMegatron345mUncased/"

# Extract the basename.
basename = os.path.dirname(path_to_checkpoint).split('/')[-1]

# Load the model.
input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu")

# Convert.
output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False)

# Print the structure of converted state dict.
if print_checkpoint_structure:
    recursive_print(None, output_state_dict)

# Store the config to file.
output_config_file = os.path.join(path_to_checkpoint, "config.json")
print(f'Saving config to "{output_config_file}"')
with open(output_config_file, "w") as f:
    json.dump(output_config, f)

# Store the state_dict to file.
output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin")
print(f'Saving checkpoint to "{output_checkpoint_file}"')
torch.save(output_state_dict, output_checkpoint_file)

We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.

BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of nvidia/megatron-bert-uncased-345m.

import os
import torch
from transformers import BertTokenizer, MegatronBertForMaskedLM, AutoModelForMaskedLM
checkpoint = "EMBO/BioMegatron345mCased"
# The tokenizer. Megatron was trained with standard tokenizer(s).
tokenizer = BertTokenizer.from_pretrained(checkpoint)
# Load the model from $MYDIR/nvidia/megatron-bert-uncased-345m.
model = AutoModelForMaskedLM.from_pretrained(checkpoint)
device = torch.device("cpu")
# Create inputs (from the BERT example page).
input = tokenizer("The capital of France is [MASK]", return_tensors="pt").to(device)
label = tokenizer("The capital of France is Paris",  return_tensors="pt")["input_ids"].to(device)
# Run the model.
with torch.no_grad():
    output = model(**input, labels=label)


This implementation has not been fine-tuned in any task. It has only the weights of the official nVIDIA checkpoint. It needs to be trained to perform any downstream task.

Original code

The original code for Megatron can be found here: https://github.com/NVIDIA/Megatron-LM.

Downloads last month
Hosted inference API

Unable to determine this model’s pipeline type. Check the docs .