Quantization made by Richard Erkhov. [Github](https://github.com/RichardErkhov) [Discord](https://discord.gg/pvy7H8DZMG) [Request more models](https://github.com/RichardErkhov/quant_request) megatron-gpt2-345m - bnb 8bits - Model creator: https://huggingface.co/robowaifudev/ - Original model: https://huggingface.co/robowaifudev/megatron-gpt2-345m/ Original model description: --- language: - en tags: - gpt2 license: apache-2.0 widget: - text: It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, datasets: - wikitext - openwebtext - spacemanidol/cc-stories model-index: - name: megatron-gpt2-345m results: - task: type: text-generation name: Text generation dataset: name: WikiText-103 type: wikitext metrics: - type: wikitext value: 19.31 name: Perplexity - task: type: text-generation name: Text generation dataset: name: WikiText-2 type: wikitext metrics: - type: wikitext value: 17.151 name: Perplexity - task: type: text-generation name: Text generation dataset: name: LAMBADA type: lambada metrics: - type: lambada value: 5.509 name: Perplexity - type: lambada value: 68.31% name: Accuracy --- This is an archive of [nvidia/megatron-gpt2-345m](https://huggingface.co/nvidia/megatron-gpt2-345m) that contains readily available model weights (375M). Its performance on Wikitext-103 is 19.31.¹ In comparison, the performance of GPT2-large (1.5B) is 17.48 and GPT2-medium (762M) is 22.05.² ### References 1. Shoeybi, Mohammad, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv, 2019, [https://doi.org/10.48550/ARXIV.1909.08053](https://doi.org/10.48550/ARXIV.1909.08053). 2. Alec Radford, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. [https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf). ## Description [Megatron](https://arxiv.org/pdf/1909.08053.pdf) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model was trained from a generative, left-to-right transformer in the style of GPT-2. This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. It contains 345 million parameters. Find more information at [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) # How to run Megatron GPT2 using Transformers ## Text generation The following code shows how to use the Megatron GPT2 checkpoint and Transformers to generate text. ```python import os import torch from transformers import GPT2Tokenizer, GPT2LMHeadModel tokenizer = GPT2Tokenizer.from_pretrained("gpt2") model = GPT2LMHeadModel.from_pretrained("robowaifudev/megatron-gpt2-345m") if torch.cuda.is_available(): device = torch.device("cuda") model.half() else: device = torch.device("cpu") model.to(device) model.eval() # Generate prompt = ( "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith," ) input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device) output = model.generate( input_ids=input_ids, max_length=len(input_ids) + 128, do_sample=True, top_k=64, top_p=0.9, temperature=0.8, num_return_sequences=2, repetition_penalty=1.025 ) # Output the text print("Prompt:", prompt) print("*" * 3) for i, sentence in enumerate(output): text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True) print(f"{i}:", text) print("*" * 3) ``` # Original code The original Megatron code can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM).