Orion Atlas 7B

Custom Mamba-2 Hybrid architecture built from scratch by Avery Palermini / Cendrix LLC.

Architecture

Type: Mamba-2 Hybrid (SSM + Differential Attention)
Parameters: ~7.1B transformer-equivalent (8.77B actual hybrid)
Layers: 32 total
- 25 Mamba-2 (SSD) layers
- 7 Differential Attention layers at indices [3, 7, 11, 15, 19, 23, 27]
- SwiGLU FFN after every layer (all 32)
Attention heads: 32 Q / 8 KV (GQA 4:1)
Attention type: Microsoft Differential Attention (ICLR 2025) -- full causal, no sliding window
SSM: Mamba-2 / SSD, expand=2, d_state=128, d_conv=4
FFN hidden dim: 14,336 (SwiGLU)
Model dim: 4,096
Normalization: RMSNorm
Position encoding: RoPE (theta=500,000) on attention layers only; Mamba-2 is position-free
Context window: 128K tokens (131,072)
Tokenizer: Custom SentencePiece (32K vocab)
Weight tying: embedding and output head shared

Why Hybrid?

Validated by NVIDIA (arXiv:2406.07887), Jamba (AI21), and Zamba (Zyphra):

Mamba-2 SSM layers handle sequence modeling efficiently (O(T) vs O(T^2))
Full-causal attention layers (7/32) provide global recall for long-context tasks
Result: better perplexity than pure transformer at same parameter count
128K context without the quadratic cost of all-attention

Design Goals

Built for agentic tasks: tool calling, structured JSON output, multi-step reasoning. Part of the Orion Atlas model family (1B -> 3B -> 7B -> 14B -> 37B).

Files

model_7b.py -- full architecture, pure PyTorch (no mamba-ssm package required)
model.py -- original 1B Llama-style transformer (reference)

Training

Pre-training: FineWeb-Edu, SlimPajama, StarCoder, OpenWebMath + custom datasets
Fine-tuning: OpenClaw tool-calling traces (in progress)
Framework: Custom PyTorch training loop

Status

Architecture released. Weights in training.

License

Apache 2.0

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for asvuep/orion-atlas-7b

An Empirical Study of Mamba-based Language Models

Paper • 2406.07887 • Published Jun 12, 2024 • 1