English

================================================================================
                      MODEL SPECIFICATIONS & STATISTICS                       
================================================================================

📐 ARCHITECTURE
   Model Type          : Altitude
   Total Layers        : 27
   Layer Breakdown:
      • Standard Transformer      : 10
      • Reasoning                 : 8
      • Feedback                  : 6
      • Mtp                       : 1
      • Moe                       : 2
   MTP Depth           : 2
   MoE Experts         : 8 (top-2 per token)

📏 DIMENSIONS
   Vocab Size          : 32,000
   Hidden Size         : 2,048
   Intermediate Size   : 5,376
   Attention Heads     : 16 (KV: 16)
   Head Dimension      : 128
   Max Position Emb.   : 2,048

🔢 PARAMETER COUNTS (Formula Estimates)
   Total Parameters    : 2,260,742,144 (2.261 B)
   Active Parameters   : 1,864,380,416 (1.864 B)
   Embedding Params    : 131,072,000 (0.131 B)
   LM Head Params      : 65,536,000 (0.066 B)
   Layer Params Breakdown:
      • Standard        :     498,073,600 (0.498 B)
      • Reasoning       :     662,700,032 (0.663 B)
      • Feedback        :     298,856,448 (0.299 B)
      • Mtp             :     108,003,328 (0.108 B)
      • Moe             :     562,036,736 (0.562 B)
   Per-Layer Params:
      • Standard        :      49,807,360 (49.81 M)
      • Reasoning       :      82,837,504 (82.84 M)
      • Feedback        :      49,809,408 (49.81 M)
      • Mtp             :      54,001,664 (54.00 M)
      • Moe             :     281,018,368 (281.02 M)

💾 MEMORY ESTIMATES (Model Weights Only)
   FP32                : 8.42 GB
   FP16/BF16           : 4.21 GB
   INT8                : 2.11 GB
   INT4                : 1.05 GB
   KV Cache (FP16)     : 0.84 GB

⚡ COMPUTE (FLOPs per Token)
   Total Active FLOPs  : 3,728,760,832 (0.00 TFLOPs)
   Per-Layer FLOPs:
      • Standard        :      99,614,720 (0.10 GFLOPs)
      • Reasoning       :     165,675,008 (0.17 GFLOPs)
      • Feedback        :      99,618,816 (0.10 GFLOPs)
      • Mtp             :     108,003,328 (0.11 GFLOPs)
      • Moe             :     165,675,008 (0.17 GFLOPs)

⚙️  CONFIGURATION
   Hidden Activation   : silu
   RMS Norm Eps        : 1e-06
   RoPE Theta          : 10000.0
   RoPE Scaling        : None
   Attention Bias      : False
   Attention Dropout   : 0.0
   MLP Bias            : False
   Tie Word Embeddings : False
   Quantization        : True (4-bit)

================================================================================
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support