--- language: en tags: - pytorch - gpt2 - language-model pipeline_tag: text-generation --- # GPT-X Model This model was trained using the GPT-X framework. ## Model Architecture - Layers: 12 - Attention Heads: 12 - Hidden Size: 768 - Vocabulary Size: 50257 - Maximum Sequence Length: 1024 - Model Type: base ## Training Details - Batch Size: 524288 - Learning Rate: 0.0006 - Weight Decay: 0.0 - Mixed Precision: True - Optimizer: muon