Quantization made by Richard Erkhov.

megatron-gpt2-345m - GGUF

Model creator: https://huggingface.co/robowaifudev/
Original model: https://huggingface.co/robowaifudev/megatron-gpt2-345m/

Name	Quant method	Size
megatron-gpt2-345m.Q2_K.gguf	Q2_K	0.17GB
megatron-gpt2-345m.IQ3_XS.gguf	IQ3_XS	0.18GB
megatron-gpt2-345m.IQ3_S.gguf	IQ3_S	0.19GB
megatron-gpt2-345m.Q3_K_S.gguf	Q3_K_S	0.19GB
megatron-gpt2-345m.IQ3_M.gguf	IQ3_M	0.2GB
megatron-gpt2-345m.Q3_K.gguf	Q3_K	0.21GB
megatron-gpt2-345m.Q3_K_M.gguf	Q3_K_M	0.21GB
megatron-gpt2-345m.Q3_K_L.gguf	Q3_K_L	0.23GB
megatron-gpt2-345m.IQ4_XS.gguf	IQ4_XS	0.22GB
megatron-gpt2-345m.Q4_0.gguf	Q4_0	0.23GB
megatron-gpt2-345m.IQ4_NL.gguf	IQ4_NL	0.23GB
megatron-gpt2-345m.Q4_K_S.gguf	Q4_K_S	0.23GB
megatron-gpt2-345m.Q4_K.gguf	Q4_K	0.25GB
megatron-gpt2-345m.Q4_K_M.gguf	Q4_K_M	0.25GB
megatron-gpt2-345m.Q4_1.gguf	Q4_1	0.25GB
megatron-gpt2-345m.Q5_0.gguf	Q5_0	0.27GB
megatron-gpt2-345m.Q5_K_S.gguf	Q5_K_S	0.27GB
megatron-gpt2-345m.Q5_K.gguf	Q5_K	0.29GB
megatron-gpt2-345m.Q5_K_M.gguf	Q5_K_M	0.29GB
megatron-gpt2-345m.Q5_1.gguf	Q5_1	0.29GB
megatron-gpt2-345m.Q6_K.gguf	Q6_K	0.32GB
megatron-gpt2-345m.Q8_0.gguf	Q8_0	0.41GB

Original model description:

language: - en tags: - gpt2 license: apache-2.0 widget: - text: It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, datasets: - wikitext - openwebtext - spacemanidol/cc-stories model-index: - name: megatron-gpt2-345m results: - task: type: text-generation name: Text generation dataset: name: WikiText-103 type: wikitext metrics: - type: wikitext value: 19.31 name: Perplexity - task: type: text-generation name: Text generation dataset: name: WikiText-2 type: wikitext metrics: - type: wikitext value: 17.151 name: Perplexity - task: type: text-generation name: Text generation dataset: name: LAMBADA type: lambada metrics: - type: lambada value: 5.509 name: Perplexity - type: lambada value: 68.31% name: Accuracy

This is an archive of nvidia/megatron-gpt2-345m that contains readily available model weights (375M). Its performance on Wikitext-103 is 19.31.¹ In comparison, the performance of GPT2-large (1.5B) is 17.48 and GPT2-medium (762M) is 22.05.²

References

Shoeybi, Mohammad, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv, 2019, https://doi.org/10.48550/ARXIV.1909.08053.
Alec Radford, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.

Description

Megatron is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model was trained from a generative, left-to-right transformer in the style of GPT-2. This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. It contains 345 million parameters.

Find more information at https://github.com/NVIDIA/Megatron-LM

How to run Megatron GPT2 using Transformers

Text generation

The following code shows how to use the Megatron GPT2 checkpoint and Transformers to generate text.

import os
import torch

from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("robowaifudev/megatron-gpt2-345m")

if torch.cuda.is_available():
    device = torch.device("cuda")
    model.half()
else:
    device = torch.device("cpu")
model.to(device)
model.eval()

# Generate
prompt = (
"It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,"
)
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
output = model.generate(
    input_ids=input_ids,
    max_length=len(input_ids) + 128,
    do_sample=True,
    top_k=64,
    top_p=0.9,
    temperature=0.8,
    num_return_sequences=2,
    repetition_penalty=1.025
)

# Output the text
print("Prompt:", prompt)
print("*" * 3)
for i, sentence in enumerate(output):
    text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True)
    print(f"{i}:", text)
    print("*" * 3)

Original code

The original Megatron code can be found here: https://github.com/NVIDIA/Megatron-LM.