Instructions to use Tribbler/ornith-1.0-apex with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Tribbler/ornith-1.0-apex with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Tribbler/ornith-1.0-apex", filename="ornith-1.0-35b-APEX-Balanced-MTP.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Tribbler/ornith-1.0-apex with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf Tribbler/ornith-1.0-apex # Run inference directly in the terminal: llama cli -hf Tribbler/ornith-1.0-apex
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf Tribbler/ornith-1.0-apex # Run inference directly in the terminal: llama cli -hf Tribbler/ornith-1.0-apex
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Tribbler/ornith-1.0-apex # Run inference directly in the terminal: ./llama-cli -hf Tribbler/ornith-1.0-apex
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Tribbler/ornith-1.0-apex # Run inference directly in the terminal: ./build/bin/llama-cli -hf Tribbler/ornith-1.0-apex
Use Docker
docker model run hf.co/Tribbler/ornith-1.0-apex
- LM Studio
- Jan
- Ollama
How to use Tribbler/ornith-1.0-apex with Ollama:
ollama run hf.co/Tribbler/ornith-1.0-apex
- Unsloth Studio
How to use Tribbler/ornith-1.0-apex with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Tribbler/ornith-1.0-apex to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Tribbler/ornith-1.0-apex to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Tribbler/ornith-1.0-apex to start chatting
- Pi
How to use Tribbler/ornith-1.0-apex with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Tribbler/ornith-1.0-apex
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Tribbler/ornith-1.0-apex" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Tribbler/ornith-1.0-apex with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Tribbler/ornith-1.0-apex
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Tribbler/ornith-1.0-apex
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use Tribbler/ornith-1.0-apex with Docker Model Runner:
docker model run hf.co/Tribbler/ornith-1.0-apex
- Lemonade
How to use Tribbler/ornith-1.0-apex with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Tribbler/ornith-1.0-apex
Run and chat with the model
lemonade run user.ornith-1.0-apex-{{QUANT_TAG}}List all available models
lemonade list
Ornith-1.0-35B โ APEX GGUF
APEX (Adaptive Precision for EXpert Models) quantizations of Ornith-1.0-35B, an open-source coding MoE model by DeepReinforce (MIT license, based on Qwen 3.5 architecture).
These quants were produced using the apex-quant toolchain. APEX is a MoE-aware mixed-precision quantization strategy that classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient โ edge layers get higher precision, middle layers more aggressive compression.
Files
Each profile comes in two variants:
- Base โ the quantized model standalone
- -MTP โ includes the bundled MTP (multi-token prediction) head, quantized to Q8_0 (near-lossless), for self-speculative decoding via
--spec-type draft-mtp. Requires a recent llama.cpp build with MTP support.
I-variants were calibrated with a diverse importance matrix (chat, code, reasoning, tool-calling, multilingual) for improved downstream accuracy.
| File | Profile | Size | Best For |
|---|---|---|---|
ornith-1.0-35b-APEX-I-Mini.gguf |
I-Mini | 14 GB | Smallest viable, fastest inference |
ornith-1.0-35b-APEX-I-Mini-MTP.gguf |
I-Mini + MTP | 14 GB | Smallest viable + self-spec |
ornith-1.0-35b-APEX-Compact.gguf |
Compact | 17 GB | Consumer GPUs, general purpose |
ornith-1.0-35b-APEX-Compact-MTP.gguf |
Compact + MTP | 17 GB | Consumer GPUs + self-spec |
ornith-1.0-35b-APEX-I-Compact.gguf ๐ |
I-Compact | 17 GB | Consumer GPUs, best quality at this size |
ornith-1.0-35b-APEX-I-Compact-MTP.gguf ๐ |
I-Compact + MTP | 17 GB | Consumer GPUs, best quality + self-spec |
ornith-1.0-35b-APEX-Quality.gguf |
Quality | 22 GB | Highest quality standard |
ornith-1.0-35b-APEX-Quality-MTP.gguf |
Quality + MTP | 23 GB | Highest quality + self-spec |
ornith-1.0-35b-APEX-I-Quality.gguf |
I-Quality | 22 GB | Highest quality with imatrix |
ornith-1.0-35b-APEX-I-Quality-MTP.gguf |
I-Quality + MTP | 23 GB | Highest quality + imatrix + self-spec |
ornith-1.0-35b-APEX-Balanced.gguf |
Balanced | 24 GB | General purpose, best trade-off |
ornith-1.0-35b-APEX-Balanced-MTP.gguf |
Balanced + MTP | 25 GB | General purpose + self-spec |
ornith-1.0-35b-APEX-I-Balanced.gguf ๐ |
I-Balanced | 24 GB | Best overall โ lowest KL divergence |
ornith-1.0-35b-APEX-I-Balanced-MTP.gguf ๐ |
I-Balanced + MTP | 25 GB | Best overall + self-spec |
Profile Precision Breakdown
APEX applies a layer-wise precision gradient to MoE expert weights. I-variants additionally use a diverse imatrix (chat, code, reasoning, tool-calling) that improves downstream accuracy and lowers KL divergence.
| Profile | Edge (blk 0-4, 35-39) | Near-Edge (blk 5-9, 30-34) | Middle (blk 10-29) | Shared Expert | Attention |
|---|---|---|---|---|---|
| Quality | Q6_K | Q5_K | IQ4_XS | Q8_0 | Q6_K |
| Balanced | Q6_K | Q5_K | Q5_K | Q8_0 | Q6_K |
| Compact | Q4_K | Q3_K | Q3_K | Q6_K | Q4_K |
| Mini | Q3_K_M | Q3_K_M | IQ2_S | Q4_K | Q3_K_M |
Quality and Mini use a 3-tier gradient. Balanced and Compact use a simpler 2-tier gradient (edge vs. middle) โ their "Near-Edge" and "Middle" columns are the same precision.
MTP Head
The bundled MTP head (blk.40.* including the nextn.* projection + norms) is quantized to Q8_0 (near-lossless) for high draft accuracy. Enable with:
llama-server -m ornith-1.0-35b-APEX-...-MTP.gguf --spec-type draft-mtp
Usage Examples
llama.cpp server (basic)
llama-server \
-m ornith-1.0-35b-APEX-I-Compact.gguf \
-ngl 99 \
-c 32768 \
--flash-attn on \
--temp 0.6 \
--top-p 0.95
With self-speculative decoding (MTP variants)
llama-server \
-m ornith-1.0-35b-APEX-I-Compact-MTP.gguf \
--spec-type draft-mtp \
-ngl 99 \
-c 32768 \
--flash-attn on
llama.cpp server with vision
Ornith has a built-in vision encoder. Vision support in GGUF format is experimental โ if a compatible mmproj becomes available, pass it with --mmproj.
Hardware Notes
| Profile | Minimum VRAM | Recommended VRAM |
|---|---|---|
| I-Mini | 16 GB | 24 GB |
| Compact / I-Compact | 20 GB | 24 GB |
| Quality / I-Quality | 24 GB | 32 GB |
| Balanced / I-Balanced | 24 GB (tight) | 32 GB+ |
Acknowledgements
- Base model: DeepReinforce โ Ornith-1.0-35B (MIT)
- APEX quantization: LocalAI team
- MTP donor tensors: IHaveNoClueAndIMustPost
- Calibration dataset: v3+ultrachat (pile-10k, GSM8K, eaddario imatrix-calibration, HuggingFaceH4/ultrachat_200k)
- Built on llama.cpp
- Downloads last month
- -
We're not able to determine the quantization variants.
Model tree for Tribbler/ornith-1.0-apex
Base model
deepreinforce-ai/Ornith-1.0-35B