nvidia
/

megatron-gpt2-345m

Model card Files Files and versions Community

LysandreJik commited on Apr 8, 2021

Commit

b214118

1 Parent(s): a41f8a7

README & Tokenizer

Browse files

Files changed (4) hide show

README.md +105 -0
merges.txt +0 -0
tokenizer.json +0 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,105 @@

+<!---
+# ##############################################################################################
+#
+# Copyright (c) 2021-, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# ##############################################################################################
+-->
+# How to run Megatron GPT2 using Transformers
+## Prerequisites
+In that guide, we run all the commands from a folder called `$MYDIR` and defined as (in `bash`):
+```
+export MYDIR=$HOME
+```
+Feel free to change the location at your convenience.
+To run some of the commands below, you'll have to clone `Transformers`.
+```
+git clone https://github.com/huggingface/transformers.git $MYDIR/transformers
+```
+## Get the checkpoints from the NVIDIA GPU Cloud
+You must create a directory called `nvidia/megatron-gpt2-345m`:
+```
+mkdir -p $MYDIR/nvidia/megatron-gpt2-345m
+```
+You can download the checkpoints from the NVIDIA GPU Cloud (NGC). For that you
+have to [sign up](https://ngc.nvidia.com/signup) for and setup the NVIDIA GPU
+Cloud (NGC) Registry CLI.  Further documentation for downloading models can be
+found in the [NGC
+documentation](https://docs.nvidia.com/dgx/ngc-registry-cli-user-guide/index.html#topic_6_4_1).
+Alternatively, you can directly download the checkpoints using:
+```
+wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_lm_345m/versions/v0.0/zip -O $MYDIR/nvidia/megatron-gpt2-345m/checkpoint.zip
+```
+## Converting the checkpoint
+In order to be loaded into `Transformers`, the checkpoint has to be converted. You should run the following command for that purpose.
+That command will create `config.json` and `pytorch_model.bin` in `$MYDIR/nvidia/megatron-gpt2-345m`.
+You can move those files to different directories if needed.
+```
+python3 $MYDIR/transformers/src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py $MYDIR/nvidia/megatron-gpt2-345m/checkpoint.zip
+```
+## Text generation
+The following code shows how to use the Megatron GPT2 checkpoint and the Transformers API to generate text.
+```
+import os
+import torch
+from transformers import GPT2Tokenizer, GPT2LMHeadModel
+# The tokenizer. Megatron was trained with standard tokenizer(s).
+tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
+# The path to the config/checkpoint (see the conversion step above).
+directory = os.path.join(os.environ['MYDIR'], 'nvidia/megatron-gpt2-345m')
+# Load the model from $MYDIR/nvidia/megatron-gpt2-345m.
+model = GPT2LMHeadModel.from_pretrained(directory)
+# Copy to the device and use FP16.
+assert torch.cuda.is_available()
+device = torch.device("cuda")
+model.to(device)
+model.eval()
+model.half()
+# Generate the sentence.
+output = model.generate(input_ids=None, max_length=32, num_return_sequences=1)
+# Output the text.
+for sentence in output:
+    sentence = sentence.tolist()
+    text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True)
+    print(text)
+```
+# Original code
+The original Megatron code can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM).

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff