Gemma 4 Claude Coder β€” local model family

A family of custom models built on Gemma 4 (edge variants E2B and E4B), tuned to act as autonomous coding and administration agents. The models speak the Anthropic-compatible API, so they drive Claude Code fully locally β€” your code never leaves your machine and cloud token cost drops to zero.

Each model ships with a system prompt focused on real work inside a codebase: use tools instead of guessing, make minimal and precise code changes, return complete and runnable output, and verify after acting. Sampling follows Google's official Gemma 4 recommendation (temperature 1.0, top_k 64, top_p 0.95), with thinking mode enabled for better planning before a tool call.

The idea

The whole point of this family is to run Claude Code on small, popular, consumer-grade hardware. No datacenter GPU, no cloud bill β€” just an everyday Mac Mini (or similar 16 GB machine) acting as a fully local, agentic coding assistant. These models make that practical: light enough to fit, smart enough to drive real tool-calling agent loops.

In a time of RAM shortages and the big tech giants tightening usage limits and quotas, owning a capable agent that runs entirely on your own modest hardware stops being a hobby and becomes leverage: no rate limits, no surprise pricing, no dependency on someone else's quota.

Models in the family

Model Base Context Purpose
gemma4-e2b-claude-coder Gemma 4 E2B (eff. 2B / 5.1B with embeddings) 64K Fast everyday coding agent β€” edits, autocomplete, short agent loops. Lightest on memory.
gemma4-e4b-claude-coder Gemma 4 E4B (eff. 4B / 8B with embeddings) 64K Stronger coding agent β€” better reasoning and tool use on larger tasks.
gemma4-e4b-claude-coder-admin Gemma 4 E4B 32K Administration and system tasks (scripts, shell, devops). Smaller context fits 100% in GPU for higher, stable throughput.

What it's for

  • Driving Claude Code locally (ollama launch claude --model <name>).
  • Agentic code writing and editing with native function calling / tool use.
  • Administration and devops tasks on a server (the admin variant).
  • Full privacy and offline operation β€” no code sent to the cloud.

Context

  • Coders (E2B / E4B): 64K tokens β€” matching Claude Code's recommendation (64K minimum).
  • Admin (E4B): 32K tokens β€” a deliberate trade-off for 16 GB hardware that keeps the model entirely on the GPU.
  • Base Gemma 4 E2B/E4B natively supports up to 128K, so context can be raised on stronger hardware.

Test hardware

The models were built and tested on:

  • Mac Mini (Apple Silicon, M-series), 16 GB RAM, macOS 15.6
  • Ollama 0.24, GPU (Metal) inference

Measured performance (16 GB RAM)

Model Placement Speed Tool calling
gemma4-e2b-claude-coder 100% GPU ~55 tok/s βœ… valid JSON
gemma4-e4b-claude-coder (64K) 39% GPU / 61% CPU ~27 tok/s (drops under load) βœ…
gemma4-e4b-claude-coder-admin (32K) 100% GPU ~30 tok/s (stable) βœ…

All three passed an end-to-end test through Claude Code: real turns with tool calls and correct responses (HTTP 200 on /v1/messages).

How they were made

These models were designed, built and tested with the help of Claude Opus 4.8 β€” the best coding model in the world. Their system prompts, parameter choices and context configuration draw directly on its knowledge. In other words: the world's best coding model prepared local models that take that work over right on your desk.

License

Apache 2.0 (inherited from the base Gemma 4).

Downloads last month
262
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support