EMBO
/

BioMegatron345mCased

 ---
 license: cc-by-4.0
 ---
+!---
+# ##############################################################################################
+#
+# This model has been uploaded to HuggingFace by https://huggingface.co/drAbreu
+# The model is based on the NVIDIA checkpoint located at
+# https://catalog.ngc.nvidia.com/orgs/nvidia/models/biomegatron345mcased
+#
+# ##############################################################################################
+-->
+[BioMegatron](https://arxiv.org/pdf/2010.06060.pdf) is a transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model trained on top of the Megatron-LM model, adding a PubMed corpusto the Megatron-LM corpora(Wikipedia, RealNews, OpenWebText, and CC-Stories). BioMegatron follows a similar (albeit not identical) architecture as BERT and it has 345 million parameters:
+* 24 layers
+* 16 attention heads with a hidden size of 1024.
+More information available at [nVIDIA NGC CATALOG](https://catalog.ngc.nvidia.com/orgs/nvidia/models/biomegatron345mcased)
+# Running BioMegatron in 🤗 transformers
+In this implementation we have followed the commands of the [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m) repository to make BioMegatron available in 🤗.
+However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py) needed a modification. The reason is that the Megatron model shown in [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m) has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.
+We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
+BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m).
+```python
+import os
+import torch
+from transformers import BertTokenizer, MegatronBertForMaskedLM, AutoModelForMaskedLM
+checkpoint = "EMBO/BioMegatron345mCased"
+# The tokenizer. Megatron was trained with standard tokenizer(s).
+tokenizer = BertTokenizer.from_pretrained(checkpoint)
+# Load the model from $MYDIR/nvidia/megatron-bert-uncased-345m.
+model = AutoModelForMaskedLM.from_pretrained(checkpoint)
+device = torch.device("cpu")
+# Create inputs (from the BERT example page).
+input = tokenizer("The capital of France is [MASK]", return_tensors="pt").to(device)
+label = tokenizer("The capital of France is Paris",  return_tensors="pt")["input_ids"].to(device)
+# Run the model.
+with torch.no_grad():
+    output = model(**input, labels=label)
+    print(output)
+```
+# Limitations
+This implementation has not been fine-tuned in any task. It has only the weights of the official nVIDIA checkpoint. It needs to be trained to perform any downstream task.
+# Original code
+The original code for Megatron can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM).