Instructions to use leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx with MLX:

# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx")
config = load_config("leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx

Run Hermes

hermes

OpenClaw new

How to use leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx

This model was converted to MLX format from deepreinforce-ai/Ornith-1.0-35B using BaseQuant_XL 5/8-bit mixed quantization optimized for Apple Silicon. The vision encoder is preserved and quantized at 5-bit, making this a full multimodal model.

BaseQuant_XL keeps the most routing-critical layers in full bf16 precision — the MoE router gate, shared expert gate, shared expert, and lm_head — while applying aggressive quantization to the bulk parameters. This preserves routing accuracy and output quality where it matters most.

Ornith-1.0-35B is a 35B-parameter MoE (Mixture of Experts) model fine-tuned from Qwen3.5-35B-A3B by DeepReinforce AI, using a self-improving RL training framework that jointly optimizes scaffold and solution rollouts for agentic coding tasks. Despite 35B total parameters, only ~3B are activated per token. It features 256 experts (8 active per token + 1 shared expert), hybrid full + linear (Gated DeltaNet) attention, and a vision encoder.

Benchmark Highlights

Benchmark	Ornith-1.0-35B	Qwen3.5-35B	Qwen3.6-35B
Terminal-Bench 2.1 (Terminus-2)	64.2	41.4	52.5
Terminal-Bench 2.1 (Claude Code)	62.8	38.9	49.2
SWE-bench Verified	75.6	70	73.4
SWE-bench Pro	50.4	44.6	49.5
SWE-bench Multilingual	69.3	60.3	67.2
NL2Repo	34.6	20.5	29.4
Claw-eval Avg	69.8	65.4	68.7

Use with mlx

pip install -U mlx-vlm

python -m mlx_vlm.generate --model leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx --max-tokens 256 --temperature 1.0 --top-p 1.0 --prompt "Hello"

BaseQuant_XL Quantization Strategy

Bit Depth	Layers	Rationale
bf16 (unquantized)	`mlp.gate` (router), `shared_expert_gate`, `lm_head`, `shared_expert`	Routing decisions and shared computation path — errors here are qualitatively different from precision loss
8-bit	`embed_tokens`, `self_attn` (full attention), `linear_attn` (DeltaNet)	Every-token layers with moderate sensitivity — 8-bit is near-lossless
5-bit	`vision_tower`, `switch_mlp` (routed experts)	Bulk of parameters, only 8 of 256 experts active per token — natural redundancy tolerates lower precision

Quantization Details

Layer	Bits	Group Size
`mlp.gate` (router)	bf16	—
`shared_expert_gate`	bf16	—
`lm_head`	bf16	—
`shared_expert`	bf16	—
`embed_tokens`	8	64
`self_attn` (full attention)	8	64
`linear_attn` (DeltaNet)	8	64
`vision_tower`	5	64
`switch_mlp` (routed experts)	5	64
Default fallback	8	64

Quantization type: BaseQuant_XL mixed (multimodal, vision preserved)
Bits per weight: 5.881
Group size: 64
Method: Custom quant_predicate via mlx_vlm

Recommended Inference Parameters

Parameter	Value
`temperature`	1.0
`top_p`	1.0
`top_k`	40
`min_p`	0.01
`repeat_penalty`	1.05

Note: Ornith-1.0-35B uses Temp 1.0 and Top_p 1.0 per the model's Terminal-Bench 2.1 benchmark recipe. This is a Qwen3.5-based model — preserve_thinking is not applicable.

Downloads last month: 574

Safetensors

Model size

7B params

Tensor type

BF16

U32

MLX

Hardware compatibility

5-bit

Model tree for leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx

Base model

deepreinforce-ai/Ornith-1.0-35B

Quantized

(115)

this model

Collection including leonsarmiento/Ornith-1.0-35B-5bit-XL-mlx

Ornith-1.0-35B MLX Quantizations

Collection

Ornith-1.0-35B MoE (35B/~3B active). Vanilla + uncensored-heretic. BaseQuant_XL recommended. • 4 items • Updated 2 days ago