Quantization made by Richard Erkhov.

[Github](https://github.com/RichardErkhov)

[Discord](https://discord.gg/pvy7H8DZMG)

[Request more models](https://github.com/RichardErkhov/quant_request)


megatron-gpt2-345m - GGUF
- Model creator: https://huggingface.co/robowaifudev/
- Original model: https://huggingface.co/robowaifudev/megatron-gpt2-345m/


| Name | Quant method | Size |
| ---- | ---- | ---- |
| [megatron-gpt2-345m.Q2_K.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q2_K.gguf) | Q2_K | 0.17GB |
| [megatron-gpt2-345m.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.IQ3_XS.gguf) | IQ3_XS | 0.18GB |
| [megatron-gpt2-345m.IQ3_S.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.IQ3_S.gguf) | IQ3_S | 0.19GB |
| [megatron-gpt2-345m.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q3_K_S.gguf) | Q3_K_S | 0.19GB |
| [megatron-gpt2-345m.IQ3_M.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.IQ3_M.gguf) | IQ3_M | 0.2GB |
| [megatron-gpt2-345m.Q3_K.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q3_K.gguf) | Q3_K | 0.21GB |
| [megatron-gpt2-345m.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q3_K_M.gguf) | Q3_K_M | 0.21GB |
| [megatron-gpt2-345m.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q3_K_L.gguf) | Q3_K_L | 0.23GB |
| [megatron-gpt2-345m.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.IQ4_XS.gguf) | IQ4_XS | 0.22GB |
| [megatron-gpt2-345m.Q4_0.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q4_0.gguf) | Q4_0 | 0.23GB |
| [megatron-gpt2-345m.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.IQ4_NL.gguf) | IQ4_NL | 0.23GB |
| [megatron-gpt2-345m.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q4_K_S.gguf) | Q4_K_S | 0.23GB |
| [megatron-gpt2-345m.Q4_K.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q4_K.gguf) | Q4_K | 0.25GB |
| [megatron-gpt2-345m.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q4_K_M.gguf) | Q4_K_M | 0.25GB |
| [megatron-gpt2-345m.Q4_1.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q4_1.gguf) | Q4_1 | 0.25GB |
| [megatron-gpt2-345m.Q5_0.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q5_0.gguf) | Q5_0 | 0.27GB |
| [megatron-gpt2-345m.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q5_K_S.gguf) | Q5_K_S | 0.27GB |
| [megatron-gpt2-345m.Q5_K.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q5_K.gguf) | Q5_K | 0.29GB |
| [megatron-gpt2-345m.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q5_K_M.gguf) | Q5_K_M | 0.29GB |
| [megatron-gpt2-345m.Q5_1.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q5_1.gguf) | Q5_1 | 0.29GB |
| [megatron-gpt2-345m.Q6_K.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q6_K.gguf) | Q6_K | 0.32GB |
| [megatron-gpt2-345m.Q8_0.gguf](https://huggingface.co/RichardErkhov/robowaifudev_-_megatron-gpt2-345m-gguf/blob/main/megatron-gpt2-345m.Q8_0.gguf) | Q8_0 | 0.41GB |


Original model description:
---
language: 
  - en
tags:
  - gpt2
license: apache-2.0
widget:
  - text: It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,
datasets:
  - wikitext
  - openwebtext
  - spacemanidol/cc-stories
model-index:
  - name: megatron-gpt2-345m
    results:
      - task:
          type: text-generation
          name: Text generation
        dataset:
          name: WikiText-103
          type: wikitext
        metrics:
          - type: wikitext
            value: 19.31
            name: Perplexity
      - task:
          type: text-generation
          name: Text generation
        dataset:
          name: WikiText-2
          type: wikitext
        metrics:
          - type: wikitext
            value: 17.151
            name: Perplexity
      - task:
          type: text-generation
          name: Text generation
        dataset:
          name: LAMBADA
          type: lambada
        metrics:
          - type: lambada
            value: 5.509
            name: Perplexity
          - type: lambada
            value: 68.31%
            name: Accuracy
---

<!---
# ##############################################################################################
# 
# Copyright (c) 2021-, NVIDIA CORPORATION.  All rights reserved.
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
#     http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# 
# ##############################################################################################
-->

This is an archive of [nvidia/megatron-gpt2-345m](https://huggingface.co/nvidia/megatron-gpt2-345m) that contains readily available model weights (375M). Its performance on Wikitext-103 is 19.31.<sup>1</sup> In comparison, the performance of GPT2-large (1.5B) is 17.48 and GPT2-medium (762M) is 22.05.<sup>2</sup>

### References

1. Shoeybi, Mohammad, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv, 2019, [https://doi.org/10.48550/ARXIV.1909.08053](https://doi.org/10.48550/ARXIV.1909.08053).
2. Alec Radford, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. [https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).

## Description

[Megatron](https://arxiv.org/pdf/1909.08053.pdf) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model was trained from a generative, left-to-right transformer in the style of GPT-2. This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. It contains 345 million parameters. 

Find more information at [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM)

# How to run Megatron GPT2 using Transformers

## Text generation

The following code shows how to use the Megatron GPT2 checkpoint and Transformers to generate text.

```python
import os
import torch

from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("robowaifudev/megatron-gpt2-345m")

if torch.cuda.is_available():
    device = torch.device("cuda")
    model.half()
else:
    device = torch.device("cpu")
model.to(device)
model.eval()

# Generate
prompt = (
"It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,"
)
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
output = model.generate(
    input_ids=input_ids,
    max_length=len(input_ids) + 128,
    do_sample=True,
    top_k=64,
    top_p=0.9,
    temperature=0.8,
    num_return_sequences=2,
    repetition_penalty=1.025
)

# Output the text
print("Prompt:", prompt)
print("*" * 3)
for i, sentence in enumerate(output):
    text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True)
    print(f"{i}:", text)
    print("*" * 3)
```

# Original code

The original Megatron code can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM).