================================================================================
MODEL SPECIFICATIONS & STATISTICS
================================================================================
📐 ARCHITECTURE
Model Type : Altitude
Total Layers : 27
Layer Breakdown:
• Standard Transformer : 10
• Reasoning : 8
• Feedback : 6
• Mtp : 1
• Moe : 2
MTP Depth : 2
MoE Experts : 8 (top-2 per token)
📏 DIMENSIONS
Vocab Size : 32,000
Hidden Size : 2,048
Intermediate Size : 5,376
Attention Heads : 16 (KV: 16)
Head Dimension : 128
Max Position Emb. : 2,048
🔢 PARAMETER COUNTS (Formula Estimates)
Total Parameters : 2,260,742,144 (2.261 B)
Active Parameters : 1,864,380,416 (1.864 B)
Embedding Params : 131,072,000 (0.131 B)
LM Head Params : 65,536,000 (0.066 B)
Layer Params Breakdown:
• Standard : 498,073,600 (0.498 B)
• Reasoning : 662,700,032 (0.663 B)
• Feedback : 298,856,448 (0.299 B)
• Mtp : 108,003,328 (0.108 B)
• Moe : 562,036,736 (0.562 B)
Per-Layer Params:
• Standard : 49,807,360 (49.81 M)
• Reasoning : 82,837,504 (82.84 M)
• Feedback : 49,809,408 (49.81 M)
• Mtp : 54,001,664 (54.00 M)
• Moe : 281,018,368 (281.02 M)
💾 MEMORY ESTIMATES (Model Weights Only)
FP32 : 8.42 GB
FP16/BF16 : 4.21 GB
INT8 : 2.11 GB
INT4 : 1.05 GB
KV Cache (FP16) : 0.84 GB
⚡ COMPUTE (FLOPs per Token)
Total Active FLOPs : 3,728,760,832 (0.00 TFLOPs)
Per-Layer FLOPs:
• Standard : 99,614,720 (0.10 GFLOPs)
• Reasoning : 165,675,008 (0.17 GFLOPs)
• Feedback : 99,618,816 (0.10 GFLOPs)
• Mtp : 108,003,328 (0.11 GFLOPs)
• Moe : 165,675,008 (0.17 GFLOPs)
⚙️ CONFIGURATION
Hidden Activation : silu
RMS Norm Eps : 1e-06
RoPE Theta : 10000.0
RoPE Scaling : None
Attention Bias : False
Attention Dropout : 0.0
MLP Bias : False
Tie Word Embeddings : False
Quantization : True (4-bit)
================================================================================
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support