Instructions to use RMDWLLC/kaiju-coder-mlx-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RMDWLLC/kaiju-coder-mlx-1.0 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="RMDWLLC/kaiju-coder-mlx-1.0",
	filename="kaiju-coder-mlx-1.0-q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use RMDWLLC/kaiju-coder-mlx-1.0 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
# Run inference directly in the terminal:
llama-cli -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
# Run inference directly in the terminal:
llama-cli -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0

Use Docker

docker model run hf.co/RMDWLLC/kaiju-coder-mlx-1.0:Q8_0

LM Studio
Jan

vLLM

How to use RMDWLLC/kaiju-coder-mlx-1.0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RMDWLLC/kaiju-coder-mlx-1.0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RMDWLLC/kaiju-coder-mlx-1.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RMDWLLC/kaiju-coder-mlx-1.0:Q8_0

Ollama
How to use RMDWLLC/kaiju-coder-mlx-1.0 with Ollama:
```
ollama run hf.co/RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
```

Unsloth Studio

How to use RMDWLLC/kaiju-coder-mlx-1.0 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RMDWLLC/kaiju-coder-mlx-1.0 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RMDWLLC/kaiju-coder-mlx-1.0 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for RMDWLLC/kaiju-coder-mlx-1.0 to start chatting

How to use RMDWLLC/kaiju-coder-mlx-1.0 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "RMDWLLC/kaiju-coder-mlx-1.0:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use RMDWLLC/kaiju-coder-mlx-1.0 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf RMDWLLC/kaiju-coder-mlx-1.0:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default RMDWLLC/kaiju-coder-mlx-1.0:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use RMDWLLC/kaiju-coder-mlx-1.0 with Docker Model Runner:
```
docker model run hf.co/RMDWLLC/kaiju-coder-mlx-1.0:Q8_0
```

Lemonade

How to use RMDWLLC/kaiju-coder-mlx-1.0 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull RMDWLLC/kaiju-coder-mlx-1.0:Q8_0

Run and chat with the model

lemonade run user.kaiju-coder-mlx-1.0-Q8_0

List all available models

lemonade list

RMDW

Kaiju-Coder MLX 1.6

The local model that runs your business, not just your IDE.
_{by Kiyomi · built by RMDW}

Kaiju-Coder MLX 1.6 is a local-first builder model for solo founders and small-business owners. It is tuned for the work that actually moves a one-person business: shipping a website, wiring Stripe checkout, writing invoices and proposals, capturing leads, building CRM/intake flows, and standing up small automations. It runs on your own machine through Ollama, LM Studio, or llama.cpp. No API key, no data leaving your laptop, Apache-2.0.

v1.6 is the image-fix release. Earlier versions built good-looking sites whose pictures often broke; v1.6 fixes that at the weights, so the model now writes image URLs that actually load (see Images that actually load), while keeping the model's concise coding style and base-class coding strength. The image fix is additive, not a tradeoff.

This is a text-only GGUF derived from Qwen3.6-35B-A3B. It is a scoped business-niche model, not a frontier general-purpose coder. See Limitations before you rely on it.

This card features v1.6 as the current release. v1.1 remains the previous version.

Images that actually load

Earlier Kaiju builds wrote nice-looking sites, but the images often 404'd. The model had learned to emit hardcoded stock-photo IDs like images.unsplash.com/photo-<id>... that do not exist, because a text model cannot know real photo IDs and invents new ones at inference.

v1.6 fixes this at the weights. The model now constructs image URLs from pattern-based sources that resolve for any value it generates:

topical photos: https://loremflickr.com/<w>/<h>/<keywords> (keyword matched to the section)
headshots / avatars: https://i.pravatar.cc/<size>?img=<n>
generic stable photos: https://picsum.photos/seed/<seed>/<w>/<h>
logos / icons: inline <svg>

It generalizes. Even for a business vertical it never saw in training, it writes a working, topical image URL (verified on novel verticals: every generated image resolved). No instruction file and no harness are required for images to load.

Quant table

Sizes are the on-disk GGUF size; RAM figures are approximate working-set estimates.

File	Bits	Size	RAM (approx)	Use
`kaiju-coder-mlx-1.6-q8_0.gguf`	Q8_0	~36.9 GB	~40 GB	Current release. Highest fidelity, the verified v1.6 artifact (available now)
`kaiju-coder-mlx-1.6-q5_k_m.gguf`	Q5_K_M	~25 GB	~28 GB	Balanced quality/size (coming soon)
`kaiju-coder-mlx-1.6-q4_k_m.gguf`	Q4_K_M	~21 GB	~24 GB	Smallest, runs on more machines (coming soon)

The v1.6 Q8_0 file is the current release (SHA256 c501eb625c66027f036295374e41b86a007801b8653e1a12eea25ea29fe9a68a). The LoRA adapter is included under adapter/ for use on top of the base model. Smaller K-quants (Q5_K_M, Q4_K_M) are coming soon; community re-quants are welcome.

This is a 35.9B-total mixture-of-experts model (architecture id qwen3_5_moe) with roughly 3B active parameters per token, so it is lighter to run than its total size suggests, but it still needs enough memory to hold the full weight set.

Quickstart

Kaiju-Coder is a chat/instruct model. Run it with thinking output turned off for customer-visible work, or you may see empty <think></think> scaffolding.

Ollama

Download the GGUF and the Modelfile into the same folder, then:

ollama create kaiju-coder-mlx:1.6 -f Modelfile
ollama run kaiju-coder-mlx:1.6 --think=false --hidethinking \
  "Build a one-page landing site for a Charlotte roofing company with a Request an Inspection CTA and real images."

API clients should pass top-level think: false:

curl http://127.0.0.1:11434/api/chat -d '{
  "model": "kaiju-coder-mlx:1.6",
  "think": false,
  "messages": [{"role": "user", "content": "Write a Stripe Checkout route for a $250 deposit."}]
}'

LM Studio

Download the GGUF into your LM Studio models folder (or use the in-app Hugging Face search).
Load the model, keep the system prompt from the GGUF metadata, disable reasoning display.
Chat normally. For tool-calling agent workflows, use the Ollama or llama.cpp path.

llama.cpp

./llama-server -m kaiju-coder-mlx-1.6-q8_0.gguf --jinja --port 8080

Raw llama-cli may render an empty <think></think> block; use the think:false flag for clean customer-facing output.

Benchmarks

Coding numbers come from a controlled EvalPlus run: think-off, greedy, the identical harness for all weights, varying only the weights, through the same Ollama runtime. Tool-calling is confirmed working; the BFCL v3 score is pending and labeled TBD; nothing is invented.

Benchmark	Base (Qwen3.6-35B-A3B)	Kaiju-Coder MLX 1.1	Kaiju-Coder MLX 1.6
Images resolve (incl. novel verticals)	n/a	broken (faked stock IDs)	pattern-based, resolve
EvalPlus pass@1 (HumanEval base)	93.3%	93.3%	92.1%
EvalPlus pass@1 (HumanEval+)	89.6%	89.6%	87.8%
EvalPlus pass@1 (MBPP base)	91.8%	90.5%	86.8%
EvalPlus pass@1 (MBPP+)	78.0%	77.8%	76.7%
BFCL v3 (tool/function calling)	TBD	TBD	TBD (run pending)

Read honestly: v1.6 fixes images natively while keeping coding concise and close to the base (see the table). It holds the base's coding strength and agentic foundation and adds the business-owner workflows, now including images that do not break. The earlier v1.5 preview traded coding for the image fix; v1.6 corrected that by re-anchoring the concise coding style.

Tool-calling is confirmed working: a direct Ollama probe returns clean write tool_calls (finish_reason tool_calls). The BFCL v3 number stays TBD until it is run.

Open rubric: the BizAgent-Gold task set and scoring rubric are open in the source repo (benchmarks/golden-bizagent-tasks.json, benchmarks/niche-config.json); any published judge score uses an open model, named in the result.

Use it as an agent (opencode)

To get agentic behavior (writing files, editing a project), run the model inside an agent harness. The recommended harness is opencode. The agentic serving path is the Ollama tag kaiju-coder-mlx-opencode:1.6 (the tool-call/opencode build, 16k context, end-of-tool-call token baked in).

ollama create kaiju-coder-mlx-opencode:1.6 -f Modelfile
cd /path/to/your/project
opencode

Select kaiju-coder-mlx-opencode:1.6 in opencode and give it the task in plain language. Cline and aider work the same way over http://127.0.0.1:11434/v1.

Limitations

Business-niche coder, not frontier. v1.6 is tuned for building business artifacts, and it writes short, direct code (no padded solutions). It keeps the base's coding strength (see Benchmarks), but it is not positioned as a general-purpose competitive coder. v1.1 remains in the repo as the previous version (no native image fix).
Scoped, not frontier. A business-niche builder model, strongest on founder workflows.
Text-only GGUF. The base is a vision-language model; this GGUF strips the vision pathway. It does not see images and does not advertise vision.
Images use placeholder services. v1.6 writes image URLs that load (loremflickr / pravatar / picsum / SVG), right for mockups and launch-ready sites. For a real brand, swap in the owner's own photos; the placeholders are there so nothing renders broken out of the box.
Run with thinking off. Pass think:false for customer-visible output.
Agentic delivery. Tool-calling is confirmed via Ollama; polished multi-file builds still benefit from a warm model and a verifier/retry harness.
Human review. Customer-facing deliverables should get a human review pass during early use.

Identity

Kaiju-Coder MLX 1.6 by Kiyomi is a local-first builder for solo founders and small-business owners. It is honest about what it is: it does not pretend to be Claude, GPT, or any other model, and it does not claim vision. Voice: direct, ship-first, no corporate filler.

License and attribution

Licensed under the Apache License, Version 2.0. See LICENSE and NOTICE.

Base model: Qwen/Qwen3.6-35B-A3B, Copyright 2026 Alibaba Cloud, licensed under Apache-2.0.
This work is a LoRA fine-tune that modified the base model, packaged as a text-only GGUF.
Fine-tuned from Qwen3.6-35B-A3B by Richard Echols / RMDW.
Not endorsed by Alibaba Cloud or the Qwen team.

Training-data policy: the fine-tune uses RMDW/Kiyomi-owned deterministic output only. No closed-model completions were used as supervised training targets. Any open-model judge used for evaluation scoring is named in the result.

Downloads last month: 95

GGUF

Model size

35B params

Architecture

qwen35moe

Hardware compatibility

8-bit

Model tree for RMDWLLC/kaiju-coder-mlx-1.0

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

(134)

this model