---
language: en
tags:
- pytorch
- gpt2
- language-model
pipeline_tag: text-generation
---

# GPT-X Model

This model was trained using the GPT-X framework. 

## Model Architecture

- Layers: 12
- Attention Heads: 12
- Hidden Size: 768
- Vocabulary Size: 50257
- Maximum Sequence Length: 1024
- Model Type: base

## Training Details

- Batch Size: 524288
- Learning Rate: 0.0006
- Weight Decay: 0.0
- Mixed Precision: True
- Optimizer: muon