Dr. Jorge Abreu Vicente commited on
Commit
fd77cf7
1 Parent(s): 6ca1a09

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -1,3 +1,57 @@
1
  ---
2
  license: cc-by-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
  ---
4
+
5
+ !---
6
+
7
+ # ##############################################################################################
8
+ #
9
+ # This model has been uploaded to HuggingFace by https://huggingface.co/drAbreu
10
+ # The model is based on the NVIDIA checkpoint located at
11
+ # https://catalog.ngc.nvidia.com/orgs/nvidia/models/biomegatron345mcased
12
+ #
13
+ # ##############################################################################################
14
+ -->
15
+
16
+ [BioMegatron](https://arxiv.org/pdf/2010.06060.pdf) is a transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model trained on top of the Megatron-LM model, adding a PubMed corpusto the Megatron-LM corpora(Wikipedia, RealNews, OpenWebText, and CC-Stories). BioMegatron follows a similar (albeit not identical) architecture as BERT and it has 345 million parameters:
17
+ * 24 layers
18
+ * 16 attention heads with a hidden size of 1024.
19
+
20
+ More information available at [nVIDIA NGC CATALOG](https://catalog.ngc.nvidia.com/orgs/nvidia/models/biomegatron345mcased)
21
+
22
+
23
+ # Running BioMegatron in 🤗 transformers
24
+
25
+ In this implementation we have followed the commands of the [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m) repository to make BioMegatron available in 🤗.
26
+
27
+ However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py) needed a modification. The reason is that the Megatron model shown in [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m) has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.
28
+
29
+ We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
30
+
31
+ BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-cased-345m).
32
+
33
+ ```python
34
+ import os
35
+ import torch
36
+ from transformers import BertTokenizer, MegatronBertForMaskedLM, AutoModelForMaskedLM
37
+ checkpoint = "EMBO/BioMegatron345mCased"
38
+ # The tokenizer. Megatron was trained with standard tokenizer(s).
39
+ tokenizer = BertTokenizer.from_pretrained(checkpoint)
40
+ # Load the model from $MYDIR/nvidia/megatron-bert-uncased-345m.
41
+ model = AutoModelForMaskedLM.from_pretrained(checkpoint)
42
+ device = torch.device("cpu")
43
+ # Create inputs (from the BERT example page).
44
+ input = tokenizer("The capital of France is [MASK]", return_tensors="pt").to(device)
45
+ label = tokenizer("The capital of France is Paris", return_tensors="pt")["input_ids"].to(device)
46
+ # Run the model.
47
+ with torch.no_grad():
48
+ output = model(**input, labels=label)
49
+ print(output)
50
+ ```
51
+
52
+ # Limitations
53
+ This implementation has not been fine-tuned in any task. It has only the weights of the official nVIDIA checkpoint. It needs to be trained to perform any downstream task.
54
+
55
+ # Original code
56
+ The original code for Megatron can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM).
57
+