Thunderbee
/

gptx_test

Text Generation

Model card Files Files and versions Community

GPT-X Model

This model was trained using the GPT-X framework.

Model Architecture

Layers: 12
Attention Heads: 12
Hidden Size: 768
Vocabulary Size: 50257
Maximum Sequence Length: 1024
Model Type: base

Training Details

Batch Size: 524288
Learning Rate: 0.0006
Weight Decay: 0.0
Mixed Precision: True
Optimizer: muon

Downloads last month: 1

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support