Orion Atlas 7B

Custom Mamba-2 Hybrid architecture built from scratch by Avery Palermini / Cendrix LLC.

Architecture

  • Type: Mamba-2 Hybrid (SSM + Differential Attention)
  • Parameters: ~7.1B transformer-equivalent (8.77B actual hybrid)
  • Layers: 32 total
    • 25 Mamba-2 (SSD) layers
    • 7 Differential Attention layers at indices [3, 7, 11, 15, 19, 23, 27]
    • SwiGLU FFN after every layer (all 32)
  • Attention heads: 32 Q / 8 KV (GQA 4:1)
  • Attention type: Microsoft Differential Attention (ICLR 2025) -- full causal, no sliding window
  • SSM: Mamba-2 / SSD, expand=2, d_state=128, d_conv=4
  • FFN hidden dim: 14,336 (SwiGLU)
  • Model dim: 4,096
  • Normalization: RMSNorm
  • Position encoding: RoPE (theta=500,000) on attention layers only; Mamba-2 is position-free
  • Context window: 128K tokens (131,072)
  • Tokenizer: Custom SentencePiece (32K vocab)
  • Weight tying: embedding and output head shared

Why Hybrid?

Validated by NVIDIA (arXiv:2406.07887), Jamba (AI21), and Zamba (Zyphra):

  • Mamba-2 SSM layers handle sequence modeling efficiently (O(T) vs O(T^2))
  • Full-causal attention layers (7/32) provide global recall for long-context tasks
  • Result: better perplexity than pure transformer at same parameter count
  • 128K context without the quadratic cost of all-attention

Design Goals

Built for agentic tasks: tool calling, structured JSON output, multi-step reasoning. Part of the Orion Atlas model family (1B -> 3B -> 7B -> 14B -> 37B).

Files

  • model_7b.py -- full architecture, pure PyTorch (no mamba-ssm package required)
  • model.py -- original 1B Llama-style transformer (reference)

Training

  • Pre-training: FineWeb-Edu, SlimPajama, StarCoder, OpenWebMath + custom datasets
  • Fine-tuning: OpenClaw tool-calling traces (in progress)
  • Framework: Custom PyTorch training loop

Status

Architecture released. Weights in training.

License

Apache 2.0

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for asvuep/orion-atlas-7b