metadata

license: cc-by-4.0

!---

drAbreu

The model is based on the NVIDIA checkpoint located at

https://catalog.ngc.nvidia.com/orgs/nvidia/models/biomegatron345mcased

-->

BioMegatron is a transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model trained on top of the Megatron-LM model, adding a PubMed corpusto the Megatron-LM corpora(Wikipedia, RealNews, OpenWebText, and CC-Stories). BioMegatron follows a similar (albeit not identical) architecture as BERT and it has 345 million parameters:

24 layers
16 attention heads with a hidden size of 1024.

More information available at nVIDIA NGC CATALOG

Running BioMegatron in 🤗 transformers

In this implementation we have followed the commands of the nvidia/megatron-bert-uncased-345m repository to make BioMegatron available in 🤗.

However, the file convert_megatron_bert_checkpoint.py needed a modification. The reason is that the Megatron model shown in nvidia/megatron-bert-uncased-345m has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.

We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.

BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of nvidia/megatron-bert-uncased-345m.

import os
import torch
from transformers import BertTokenizer, MegatronBertForMaskedLM, AutoModelForMaskedLM
checkpoint = "EMBO/BioMegatron345mCased"
# The tokenizer. Megatron was trained with standard tokenizer(s).
tokenizer = BertTokenizer.from_pretrained(checkpoint)
# Load the model from $MYDIR/nvidia/megatron-bert-uncased-345m.
model = AutoModelForMaskedLM.from_pretrained(checkpoint)
device = torch.device("cpu")
# Create inputs (from the BERT example page).
input = tokenizer("The capital of France is [MASK]", return_tensors="pt").to(device)
label = tokenizer("The capital of France is Paris",  return_tensors="pt")["input_ids"].to(device)
# Run the model.
with torch.no_grad():
    output = model(**input, labels=label)
    print(output)

Limitations

This implementation has not been fine-tuned in any task. It has only the weights of the official nVIDIA checkpoint. It needs to be trained to perform any downstream task.

Original code

The original code for Megatron can be found here: https://github.com/NVIDIA/Megatron-LM.