File size: 4,584 Bytes
5ee52db
 
 
 
 
 
f803395
 
5ee52db
 
 
699179f
4721549
 
 
 
 
 
 
 
 
 
 
 
d28a350
ce7f130
 
 
 
 
 
 
 
 
 
f1901b3
 
 
 
 
 
 
 
 
 
 
 
 
5ee52db
 
c0fd3c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ee52db
c0fd3c3
 
 
ecde5fe
 
 
5ee52db
c0fd3c3
 
5ee52db
c0fd3c3
 
 
 
5ee52db
 
c0fd3c3
 
ecde5fe
5ee52db
 
c0fd3c3
 
 
5ee52db
c0fd3c3
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
language: 
  - en
tags:
  - gpt2
license: apache-2.0
widget:
  - text: It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,
datasets:
  - wikitext
  - openwebtext
  - spacemanidol/cc-stories
model-index:
  - name: megatron-gpt2-345m
    results:
      - task:
          type: text-generation
          name: Text generation
        dataset:
          name: WikiText-103
          type: wikitext
        metrics:
          - type: wikitext
            value: 19.31
            name: Perplexity
      - task:
          type: text-generation
          name: Text generation
        dataset:
          name: WikiText-2
          type: wikitext
        metrics:
          - type: wikitext
            value: 17.151
            name: Perplexity
      - task:
          type: text-generation
          name: Text generation
        dataset:
          name: LAMBADA
          type: lambada
        metrics:
          - type: lambada
            value: 5.509
            name: Perplexity
          - type: lambada
            value: 68.31%
            name: Accuracy
---

<!---
# ##############################################################################################
# 
# Copyright (c) 2021-, NVIDIA CORPORATION.  All rights reserved.
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
#     http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# 
# ##############################################################################################
-->

This is an archive of [nvidia/megatron-gpt2-345m](https://huggingface.co/nvidia/megatron-gpt2-345m) that contains readily available model weights (375M). Its performance on Wikitext-103 is 19.31.<sup>1</sup> In comparison, the performance of GPT2-large (1.5B) is 17.48 and GPT2-medium (762M) is 22.05.<sup>2</sup>

### References

1. Shoeybi, Mohammad, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv, 2019, [https://doi.org/10.48550/ARXIV.1909.08053](https://doi.org/10.48550/ARXIV.1909.08053).
2. Alec Radford, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. [https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).

## Description

[Megatron](https://arxiv.org/pdf/1909.08053.pdf) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model was trained from a generative, left-to-right transformer in the style of GPT-2. This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. It contains 345 million parameters. 

Find more information at [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM)

# How to run Megatron GPT2 using Transformers

## Text generation

The following code shows how to use the Megatron GPT2 checkpoint and Transformers to generate text.

```python
import os
import torch

from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("robowaifudev/megatron-gpt2-345m")

if torch.cuda.is_available():
    device = torch.device("cuda")
    model.half()
else:
    device = torch.device("cpu")
model.to(device)
model.eval()

# Generate
prompt = (
"It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,"
)
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
output = model.generate(
    input_ids=input_ids,
    max_length=len(input_ids) + 128,
    do_sample=True,
    top_k=64,
    top_p=0.9,
    temperature=0.8,
    num_return_sequences=2,
    repetition_penalty=1.025
)

# Output the text
print("Prompt:", prompt)
print("*" * 3)
for i, sentence in enumerate(output):
    text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True)
    print(f"{i}:", text)
    print("*" * 3)
```

# Original code

The original Megatron code can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM).