GPT-X Model

This model was trained using the GPT-X framework.

Model Architecture

  • Layers: 12
  • Attention Heads: 12
  • Hidden Size: 768
  • Vocabulary Size: 50257
  • Maximum Sequence Length: 1024
  • Model Type: base

Training Details

  • Batch Size: 524288
  • Learning Rate: 0.0006
  • Weight Decay: 0.0
  • Mixed Precision: True
  • Optimizer: muon
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support