Instructions to use SC117/Ornith-1.0-35B-MTP-APEX-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SC117/Ornith-1.0-35B-MTP-APEX-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SC117/Ornith-1.0-35B-MTP-APEX-GGUF")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("SC117/Ornith-1.0-35B-MTP-APEX-GGUF", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use SC117/Ornith-1.0-35B-MTP-APEX-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SC117/Ornith-1.0-35B-MTP-APEX-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SC117/Ornith-1.0-35B-MTP-APEX-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/SC117/Ornith-1.0-35B-MTP-APEX-GGUF

SGLang

How to use SC117/Ornith-1.0-35B-MTP-APEX-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SC117/Ornith-1.0-35B-MTP-APEX-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SC117/Ornith-1.0-35B-MTP-APEX-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SC117/Ornith-1.0-35B-MTP-APEX-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SC117/Ornith-1.0-35B-MTP-APEX-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use SC117/Ornith-1.0-35B-MTP-APEX-GGUF with Docker Model Runner:
```
docker model run hf.co/SC117/Ornith-1.0-35B-MTP-APEX-GGUF
```

APEX MTP Vision MIT

Ornith-1.0-35B-MTP-APEX

English | 📖 中文文档

Self-improving agentic coding model · APEX quantized GGUFs + BF16 + mmproj

🐦 About Ornith

Ornith-1.0-35B is a self-improving agentic coding model from DeepReinforce AI, post-trained on top of Qwen3.5 with RL to jointly optimize scaffold generation and solution rollouts.

It achieves state-of-the-art performance among open-source models of comparable size on Terminal-Bench 2.1, SWE-Bench Verified/Pro/Multilingual, NL2Repo, and OpenClaw.

This GGUF package includes the mmproj-F16.gguf vision projector for multimodal (image + text) capabilities with llama.cpp. MTP layers are sourced from Qwen3.5-35B-A3B (same architecture, compatible weights). License: MIT.

🧠 Model Details

Architecture	Qwen3.5 MoE (Mixture of Experts)
Parameters	35B total, 3B active per token
Experts	256 routed experts, 8 active per token
Layers	40 transformer layers + 1 MTP layer
Context	262,144 tokens
MTP	1 MTP layer (785 tensors) from Qwen3.5-35B-A3B
License	MIT

📊 BenchLocal Results (APEX-I-Compact, 15.85 GB)

Mode	ToolCall-15	BugFind-15	HermesAgent-20	Max	Eff.
Thinking	100	93	89	93.5	75.5
No Thinking	100	92	89	93.2	85.2

RTX 5070 Ti · No-thinking mode achieves better practical reliability (fewer retries).

🚀 Usage

llama.cpp (text only)

hf download SC117/Ornith-1.0-35B-MTP-APEX-GGUF --include "*.gguf" --local-dir ./models ./llama-server -m ./models/Ornith-1.0-35B-MTP-APEX-I-Compact.gguf -ngl 99 -c 131072

llama.cpp (vision + text)

./llama-server -m ./models/Ornith-1.0-35B-MTP-APEX-I-Compact.gguf --mmproj ./models/mmproj-F16.gguf -ngl 99 -c 131072

🎛️ Recommended Settings

Mode	Parameters
General	temperature=0.6, top_p=0.95, top_k=20
Coding	temperature=0.6, top_p=0.95, top_k=20

💡 What is APEX?

These GGUF files are quantized using APEX, an MoE-aware mixed-precision quantization technique. APEX classifies every tensor by its role — routed expert, shared expert, or attention — and applies a layer-wise precision gradient, giving sensitive edge layers higher precision and compressing redundant middle layers more aggressively.

APEX beats Q8_0 perplexity at half the size — and even beats F16.

📦 APEX Quantization Tiers

File	Size	Profile	Best For
`*-APEX-I-Quality.gguf`	21.90 GB	I-Quality	Highest quality, best accuracy
`*-APEX-I-Balanced.gguf`	24.18 GB	I-Balanced	Best all-rounder, recommended
`*-APEX-I-Compact.gguf`	15.85 GB	I-Compact	Best quality/size ratio

Citation

@misc{ornith-35b,
    title = {{Ornith-1.0-35B}: Agentic Coding, Open to All},
    url = {https://deep-reinforce.com/ornith_1_0.html},
    author = {{DeepReinforce Team}},
    year = {2026}
}

Downloads last month: 1,298

GGUF

Model size

0.4B params

Architecture

clip

Hardware compatibility

16-bit

View +3 variants

Model tree for SC117/Ornith-1.0-35B-MTP-APEX-GGUF

Base model

deepreinforce-ai/Ornith-1.0-35B

Quantized

(69)

this model

SC117
/

Ornith-1.0-35B-MTP-APEX-GGUF

Ornith-1.0-35B-MTP-APEX

Links

Citation

Model tree for SC117/Ornith-1.0-35B-MTP-APEX-GGUF