APEX MTP Vision MIT

Ornith-1.0-35B-MTP-APEX

English | πŸ“– δΈ­ζ–‡ζ–‡ζ‘£

Self-improving agentic coding model Β· APEX quantized GGUFs + BF16 + mmproj

🐦 About Ornith

Ornith-1.0-35B is a self-improving agentic coding model from DeepReinforce AI, post-trained on top of Qwen3.5 with RL to jointly optimize scaffold generation and solution rollouts.

It achieves state-of-the-art performance among open-source models of comparable size on Terminal-Bench 2.1, SWE-Bench Verified/Pro/Multilingual, NL2Repo, and OpenClaw.

This GGUF package includes the mmproj-F16.gguf vision projector for multimodal (image + text) capabilities with llama.cpp. MTP layers are sourced from Qwen3.5-35B-A3B (same architecture, compatible weights). License: MIT.

🧠 Model Details
ArchitectureQwen3.5 MoE (Mixture of Experts)
Parameters35B total, 3B active per token
Experts256 routed experts, 8 active per token
Layers40 transformer layers + 1 MTP layer
Context262,144 tokens
MTP1 MTP layer (785 tensors) from Qwen3.5-35B-A3B
LicenseMIT
πŸ“Š BenchLocal Results (APEX-I-Compact, 15.85 GB)
ModeToolCall-15BugFind-15HermesAgent-20MaxEff.
Thinking100938993.575.5
No Thinking100928993.285.2

RTX 5070 Ti Β· No-thinking mode achieves better practical reliability (fewer retries).

πŸš€ Usage

llama.cpp (text only)

hf download SC117/Ornith-1.0-35B-MTP-APEX-GGUF --include "*.gguf" --local-dir ./models ./llama-server -m ./models/Ornith-1.0-35B-MTP-APEX-I-Compact.gguf -ngl 99 -c 131072

llama.cpp (vision + text)

./llama-server -m ./models/Ornith-1.0-35B-MTP-APEX-I-Compact.gguf --mmproj ./models/mmproj-F16.gguf -ngl 99 -c 131072

πŸŽ›οΈ Recommended Settings
ModeParameters
Generaltemperature=0.6, top_p=0.95, top_k=20
Codingtemperature=0.6, top_p=0.95, top_k=20
πŸ’‘ What is APEX?

These GGUF files are quantized using APEX, an MoE-aware mixed-precision quantization technique. APEX classifies every tensor by its role β€” routed expert, shared expert, or attention β€” and applies a layer-wise precision gradient, giving sensitive edge layers higher precision and compressing redundant middle layers more aggressively.

APEX beats Q8_0 perplexity at half the size β€” and even beats F16.

πŸ“¦ APEX Quantization Tiers
FileSizeProfileBest For
*-APEX-I-Quality.gguf21.90 GBI-QualityHighest quality, best accuracy
*-APEX-I-Balanced.gguf24.18 GBI-BalancedBest all-rounder, recommended
*-APEX-I-Compact.gguf15.85 GBI-CompactBest quality/size ratio

Links

Citation

@misc{ornith-35b,
    title = {{Ornith-1.0-35B}: Agentic Coding, Open to All},
    url = {https://deep-reinforce.com/ornith_1_0.html},
    author = {{DeepReinforce Team}},
    year = {2026}
}
Downloads last month
1,298
GGUF
Model size
0.4B params
Architecture
clip
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SC117/Ornith-1.0-35B-MTP-APEX-GGUF

Quantized
(69)
this model