Instructions to use RMDWLLC/kaiju-coder-mlx-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use RMDWLLC/kaiju-coder-mlx-1.0 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="RMDWLLC/kaiju-coder-mlx-1.0", filename="kaiju-coder-mlx-1.0-q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use RMDWLLC/kaiju-coder-mlx-1.0 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0 # Run inference directly in the terminal: llama-cli -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0 # Run inference directly in the terminal: llama-cli -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
Use Docker
docker model run hf.co/RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
- LM Studio
- Jan
- vLLM
How to use RMDWLLC/kaiju-coder-mlx-1.0 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RMDWLLC/kaiju-coder-mlx-1.0" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RMDWLLC/kaiju-coder-mlx-1.0", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
- Ollama
How to use RMDWLLC/kaiju-coder-mlx-1.0 with Ollama:
ollama run hf.co/RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
- Unsloth Studio
How to use RMDWLLC/kaiju-coder-mlx-1.0 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RMDWLLC/kaiju-coder-mlx-1.0 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RMDWLLC/kaiju-coder-mlx-1.0 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for RMDWLLC/kaiju-coder-mlx-1.0 to start chatting
- Pi
How to use RMDWLLC/kaiju-coder-mlx-1.0 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "RMDWLLC/kaiju-coder-mlx-1.0:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use RMDWLLC/kaiju-coder-mlx-1.0 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
Run Hermes
hermes
- Docker Model Runner
How to use RMDWLLC/kaiju-coder-mlx-1.0 with Docker Model Runner:
docker model run hf.co/RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
- Lemonade
How to use RMDWLLC/kaiju-coder-mlx-1.0 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
Run and chat with the model
lemonade run user.kaiju-coder-mlx-1.0-Q8_0
List all available models
lemonade list
Kaiju-Coder MLX 1.6
The local model that runs your business, not just your IDE.
by Kiyomi · built by RMDW
Kaiju-Coder MLX 1.6 is a local-first builder model for solo founders and small-business owners. It is tuned for the work that actually moves a one-person business: shipping a website, wiring Stripe checkout, writing invoices and proposals, capturing leads, building CRM/intake flows, and standing up small automations. It runs on your own machine through Ollama, LM Studio, or llama.cpp. No API key, no data leaving your laptop, Apache-2.0.
v1.6 is the image-fix release. Earlier versions built good-looking sites whose pictures often broke; v1.6 fixes that at the weights, so the model now writes image URLs that actually load (see Images that actually load), while keeping the model's concise coding style and base-class coding strength. The image fix is additive, not a tradeoff.
This is a text-only GGUF derived from Qwen3.6-35B-A3B. It is a scoped business-niche model, not a frontier general-purpose coder. See Limitations before you rely on it.
This card features v1.6 as the current release. v1.1 remains the previous version.
Images that actually load
Earlier Kaiju builds wrote nice-looking sites, but the images often 404'd. The model had
learned to emit hardcoded stock-photo IDs like images.unsplash.com/photo-<id>... that do
not exist, because a text model cannot know real photo IDs and invents new ones at inference.
v1.6 fixes this at the weights. The model now constructs image URLs from pattern-based sources that resolve for any value it generates:
- topical photos:
https://loremflickr.com/<w>/<h>/<keywords>(keyword matched to the section) - headshots / avatars:
https://i.pravatar.cc/<size>?img=<n> - generic stable photos:
https://picsum.photos/seed/<seed>/<w>/<h> - logos / icons: inline
<svg>
It generalizes. Even for a business vertical it never saw in training, it writes a working, topical image URL (verified on novel verticals: every generated image resolved). No instruction file and no harness are required for images to load.
Quant table
Sizes are the on-disk GGUF size; RAM figures are approximate working-set estimates.
| File | Bits | Size | RAM (approx) | Use |
|---|---|---|---|---|
kaiju-coder-mlx-1.6-q8_0.gguf |
Q8_0 | ~36.9 GB | ~40 GB | Current release. Highest fidelity, the verified v1.6 artifact (available now) |
kaiju-coder-mlx-1.6-q5_k_m.gguf |
Q5_K_M | ~25 GB | ~28 GB | Balanced quality/size (coming soon) |
kaiju-coder-mlx-1.6-q4_k_m.gguf |
Q4_K_M | ~21 GB | ~24 GB | Smallest, runs on more machines (coming soon) |
The v1.6 Q8_0 file is the current release (SHA256 c501eb625c66027f036295374e41b86a007801b8653e1a12eea25ea29fe9a68a). The LoRA adapter is included
under adapter/ for use on top of the base model. Smaller K-quants (Q5_K_M, Q4_K_M) are coming
soon; community re-quants are welcome.
This is a 35.9B-total mixture-of-experts model (architecture id qwen3_5_moe) with roughly
3B active parameters per token, so it is lighter to run than its total size suggests, but it
still needs enough memory to hold the full weight set.
Quickstart
Kaiju-Coder is a chat/instruct model. Run it with thinking output turned off for
customer-visible work, or you may see empty <think></think> scaffolding.
Ollama
Download the GGUF and the Modelfile into the same folder, then:
ollama create kaiju-coder-mlx:1.6 -f Modelfile
ollama run kaiju-coder-mlx:1.6 --think=false --hidethinking \
"Build a one-page landing site for a Charlotte roofing company with a Request an Inspection CTA and real images."
API clients should pass top-level think: false:
curl http://127.0.0.1:11434/api/chat -d '{
"model": "kaiju-coder-mlx:1.6",
"think": false,
"messages": [{"role": "user", "content": "Write a Stripe Checkout route for a $250 deposit."}]
}'
LM Studio
- Download the GGUF into your LM Studio models folder (or use the in-app Hugging Face search).
- Load the model, keep the system prompt from the GGUF metadata, disable reasoning display.
- Chat normally. For tool-calling agent workflows, use the Ollama or llama.cpp path.
llama.cpp
./llama-server -m kaiju-coder-mlx-1.6-q8_0.gguf --jinja --port 8080
Raw llama-cli may render an empty <think></think> block; use the think:false flag for
clean customer-facing output.
Benchmarks
Coding numbers come from a controlled EvalPlus run: think-off, greedy, the identical harness for all weights, varying only the weights, through the same Ollama runtime. Tool-calling is confirmed working; the BFCL v3 score is pending and labeled TBD; nothing is invented.
| Benchmark | Base (Qwen3.6-35B-A3B) | Kaiju-Coder MLX 1.1 | Kaiju-Coder MLX 1.6 |
|---|---|---|---|
| Images resolve (incl. novel verticals) | n/a | broken (faked stock IDs) | pattern-based, resolve |
| EvalPlus pass@1 (HumanEval base) | 93.3% | 93.3% | 92.1% |
| EvalPlus pass@1 (HumanEval+) | 89.6% | 89.6% | 87.8% |
| EvalPlus pass@1 (MBPP base) | 91.8% | 90.5% | 86.8% |
| EvalPlus pass@1 (MBPP+) | 78.0% | 77.8% | 76.7% |
| BFCL v3 (tool/function calling) | TBD | TBD | TBD (run pending) |
Read honestly: v1.6 fixes images natively while keeping coding concise and close to the base (see the table). It holds the base's coding strength and agentic foundation and adds the business-owner workflows, now including images that do not break. The earlier v1.5 preview traded coding for the image fix; v1.6 corrected that by re-anchoring the concise coding style.
Tool-calling is confirmed working: a direct Ollama probe returns clean write tool_calls
(finish_reason tool_calls). The BFCL v3 number stays TBD until it is run.
Open rubric: the BizAgent-Gold task set and scoring rubric are open in the source repo
(benchmarks/golden-bizagent-tasks.json, benchmarks/niche-config.json); any published judge
score uses an open model, named in the result.
Use it as an agent (opencode)
To get agentic behavior (writing files, editing a project), run the model inside an agent
harness. The recommended harness is opencode. The agentic serving path is the Ollama tag
kaiju-coder-mlx-opencode:1.6 (the tool-call/opencode build, 16k context, end-of-tool-call
token baked in).
ollama create kaiju-coder-mlx-opencode:1.6 -f Modelfile
cd /path/to/your/project
opencode
Select kaiju-coder-mlx-opencode:1.6 in opencode and give it the task in plain language.
Cline and aider work the same way over http://127.0.0.1:11434/v1.
Limitations
- Business-niche coder, not frontier. v1.6 is tuned for building business artifacts, and it writes short, direct code (no padded solutions). It keeps the base's coding strength (see Benchmarks), but it is not positioned as a general-purpose competitive coder. v1.1 remains in the repo as the previous version (no native image fix).
- Scoped, not frontier. A business-niche builder model, strongest on founder workflows.
- Text-only GGUF. The base is a vision-language model; this GGUF strips the vision pathway. It does not see images and does not advertise vision.
- Images use placeholder services. v1.6 writes image URLs that load (loremflickr / pravatar / picsum / SVG), right for mockups and launch-ready sites. For a real brand, swap in the owner's own photos; the placeholders are there so nothing renders broken out of the box.
- Run with thinking off. Pass
think:falsefor customer-visible output. - Agentic delivery. Tool-calling is confirmed via Ollama; polished multi-file builds still benefit from a warm model and a verifier/retry harness.
- Human review. Customer-facing deliverables should get a human review pass during early use.
Identity
Kaiju-Coder MLX 1.6 by Kiyomi is a local-first builder for solo founders and small-business owners. It is honest about what it is: it does not pretend to be Claude, GPT, or any other model, and it does not claim vision. Voice: direct, ship-first, no corporate filler.
License and attribution
Licensed under the Apache License, Version 2.0. See LICENSE and NOTICE.
- Base model: Qwen/Qwen3.6-35B-A3B, Copyright 2026 Alibaba Cloud, licensed under Apache-2.0.
- This work is a LoRA fine-tune that modified the base model, packaged as a text-only GGUF.
- Fine-tuned from Qwen3.6-35B-A3B by Richard Echols / RMDW.
- Not endorsed by Alibaba Cloud or the Qwen team.
Training-data policy: the fine-tune uses RMDW/Kiyomi-owned deterministic output only. No closed-model completions were used as supervised training targets. Any open-model judge used for evaluation scoring is named in the result.
- Downloads last month
- 95
8-bit
Model tree for RMDWLLC/kaiju-coder-mlx-1.0
Base model
Qwen/Qwen3.6-35B-A3B