Ornith-1.0-9B β Core AI (agentic coder, 48β59 tok/s on M4 Max)
Apple Core AI (.aimodel) conversion of
deepreinforce-ai/Ornith-1.0-9B
(text decoder): DeepReinforce's self-scaffolding agentic-coding model β trained to jointly
solve coding tasks and construct the orchestration scaffold that guides the solution.
Architecturally a Qwen3.5 hybrid decoder (model_type qwen3_5): 32 layers on a 3:1
interleave of GatedDeltaNet linear-attention mixers and gated full attention, untied
248320-vocab head. Runs fully on-device on Apple silicon via Apple's coreai-pipelined
GPU engine.
Part of the community Core AI model zoo: https://github.com/john-rocky/coreai-model-zoo
(full card: zoo/ornith-1.0-9b.md).
Bundles
| bundle | size | M4 Max decode / prefill | quality |
|---|---|---|---|
gpu-pipelined/ornith_1_0_9b_decode_int8hu_block32_sym/ (ship) |
9.8 GB | 48.3 / 48.5 tok/s | teacher-forced eager gate 24/24 exact vs fp32 HF oracle; engine greedy 12/12 token-exact |
gpu-pipelined/ornith_1_0_9b_decode_int4lin/ (speed option) |
7.5 GB | 58.9 / 59.0 tok/s (+22%) | ALSO gates 24/24 exact + engine 12/12 β the first clean int4 PTQ pass in the Qwen3.5 family (verified at short context; int8 carries the quality claim at long context) |
Both are decode-only loop-free S=1 LanguageBundles (max_ctx 8192, tokenizer + chat
template embedded), gated against a margin-validated fp32 HF oracle (min top-2 margin
0.205; the eager fp16 baseline is also 24/24, so int8/int4 add zero flips).
Recipe: linear int8/int4 per-block-32 body + absmax symmetric per-block-32 int8 on the
untied big-vocab head (int8hu). Conversion is the zoo's stock Qwen3.5 script with nothing
but an --hf-id swap β see
conversion/README.md.
Run it (macOS)
Easiest: CoreAIChatMac (the zoo's Mac chat app β
apps/CoreAIChatMac)
downloads this repo in-app: pick Ornith-1.0-9B in the Downloads panel.
CLI (needs the zoo's coreai-models checkout + the pipelined extra-states patch):
COREAI_CHUNK_THRESHOLD=1 llm-benchmark \
--model ornith_1_0_9b_decode_int8hu_block32_sym -p 128 -g 256 -n 3
COREAI_CHUNK_THRESHOLD=1 llm-runner \
--model ornith_1_0_9b_decode_int8hu_block32_sym \
--prompt "Write a rate limiter in Swift." --sampling-strategy greedy \
--warmup exact --warmup-length 1
Notes: COREAI_CHUNK_THRESHOLD=1 before engine creation; never engine.warmup() on an
S=1 bundle; Release builds only. Details + every trap:
knowledge/pipelined-engine.md.
iPhone
Not this one (yet): int8 at 9.8 GB exceeds the entitled jetsam ceiling (~6.4 GB on an iPhone 17 Pro). The arithmetic route is int4 body + int8 head (β6.5 GB) or an in-graph int8 embed table (β5.5 GB) β tracked in the zoo card.
License
MIT β as declared by the upstream model card (deepreinforce-ai/Ornith-1.0-9B; the upstream repo ships no LICENSE file, so none is mirrored here). Conversion scripts and harness: see the zoo repo.
- Downloads last month
- 269
Model tree for mlboydaisuke/Ornith-1.0-9B-CoreAI
Base model
deepreinforce-ai/Ornith-1.0-9B