Dr. Jorge Abreu Vicente commited on
Commit
596cbe3
1 Parent(s): 8bdbcba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -1,3 +1,58 @@
1
  ---
2
  license: cc-by-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
  ---
4
+ !---
5
+
6
+ # ##############################################################################################
7
+ #
8
+ # This model has been uploaded to HuggingFace by https://huggingface.co/drAbreu
9
+ # The model is based on the NVIDIA checkpoint located at
10
+ # https://catalog.ngc.nvidia.com/orgs/nvidia/models/biomegatron345muncased
11
+ #
12
+ # ##############################################################################################
13
+ -->
14
+
15
+ [BioMegatron](https://arxiv.org/pdf/2010.06060.pdf) is a transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model trained on top of the Megatron-LM model, adding a PubMed corpusto the Megatron-LM corpora(Wikipedia, RealNews, OpenWebText, and CC-Stories). BioMegatron follows a similar (albeit not identical) architecture as BERT and it has 345 million parameters:
16
+ * 24 layers
17
+ * 16 attention heads with a hidden size of 1024.
18
+
19
+ More information available at [nVIDIA NGC CATALOG](https://catalog.ngc.nvidia.com/orgs/nvidia/models/biomegatron345muncased)
20
+
21
+
22
+ # Running BioMegatron in 🤗 transformers
23
+
24
+ In this implementation we have followed the commands of the [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-uncased-345m) repository to make BioMegatron available in 🤗.
25
+
26
+ However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py) needed a modification. The reason is that the Megatron model shown in [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-uncased-345m) has included head layers, while the weights of the BioMegatron model that we upload to this repository do not contain a head.
27
+
28
+ We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
29
+
30
+ BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-uncased-345m).
31
+
32
+ ```
33
+ import os
34
+ import torch
35
+
36
+ from transformers import BertTokenizer, MegatronBertForMaskedLM, AutoModelForMaskedLM
37
+ checkpoint = "EMBO/BioMegatron345mUncased"
38
+
39
+ # The tokenizer. Megatron was trained with standard tokenizer(s).
40
+ tokenizer = BertTokenizer.from_pretrained(checkpoint)
41
+ # Load the model from $MYDIR/nvidia/megatron-bert-uncased-345m.
42
+ model = AutoModelForMaskedLM.from_pretrained(checkpoint)
43
+ device = torch.device("cpu")
44
+ # Create inputs (from the BERT example page).
45
+ input = tokenizer("The capital of France is [MASK]", return_tensors="pt").to(device)
46
+ label = tokenizer("The capital of France is Paris", return_tensors="pt")["input_ids"].to(device)
47
+
48
+ # Run the model.
49
+ with torch.no_grad():
50
+ output = model(**input, labels=label)
51
+ print(output)
52
+ ```
53
+
54
+ # Limitations
55
+ This implementation has not been fine-tuned in any task. It has only the weights of the official nVIDIA checkpoint. It needs to be trained to perform any downstream task.
56
+
57
+ # Original code
58
+ The original code for Megatron can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM).