gemma-4-12B-coder-fable5-composer2.5-v1

Instructions to use MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1")
model = AutoModelForMultimodalLM.from_pretrained("MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1

SGLang

How to use MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1 with Docker Model Runner:
```
docker model run hf.co/MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1
```

💻 Gemma4-12B-Coder — safetensors master (full precision) ✨

Composer 2.5 × Fable 5 · v1 / code edition

This is the full-precision safetensors master for my Gemma 4 12B coding fine-tune — the same model many of you have been running as GGUF, now in its original weights. 🧠💻 A focused fine-tune of Gemma 4 12B on verifiable Python coding data: it reasons in the open (edge cases, complexity, approach) and then writes a clean, runnable solution.

🎯 What this repo is for

This repo holds the un-quantized master weights (model.safetensors, bf16). Use it to:

🔧 Roll your own quants — make custom GGUF / MLX / AWQ / GPTQ builds from full precision.
🧪 Fine-tune further — it's a clean base for your own LoRA / continued training.
🤗 Run it in transformers (needs a recent build with gemma4_unified support).

🏃 Just want to run it? You don't need this repo — grab a ready-made quant from the GGUF repo → (runs in ~4.5 GB of VRAM / unified memory in LM Studio, Ollama, llama.cpp, Jan…). This master is for builders. 💚

📌 Announcements

🚀 v2 is almost here! Initial training of v2 is done and it's in benchmarking + final QA. So many of you flagged the agentic behavior — so this round I significantly grew the dataset (especially agentic data). v2 is focused on agentic + coding. Targeting a release this Friday or Saturday (US Pacific). 🎉

📣 Context length is 256K. This master ships with the corrected max_position_embeddings = 262144 (256K) — the well-known upstream Gemma 4 metadata bug (config.json once said 131072) is already fixed here, so anything you quantize/convert from these weights inherits the full 256K. 💚 Thanks to the community member who spotted it!

🤗 Run it in transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16, device_map="auto")

msgs = [{"role": "user", "content": "Write a Python function to check if a string is a valid IPv4 address."}]
inputs = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=1024)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

🧠 Thinking mode: it thinks in Gemma's native thought channel before answering (keep enable_thinking=true, the default chat template handles it). Recommended sampling: temp 1.0, top_p 0.95, top_k 64; for coding you can also go greedy (temp 0) for more deterministic solutions. Needs a recent transformers that knows the gemma4_unified architecture.

📦 Ready-made GGUF quants

All from the GGUF repo:

Quant	Size	Vibe
🟢 Q2_K	4.5 GB	tiniest — runs almost anywhere
🟡 Q3_K_M	5.7 GB	great for 8 GB VRAM
🔵 Q4_K_M	6.87 GB	the sweet spot 👌 (recommended)
🟣 Q6_K	9.11 GB	near-lossless
⚪ Q8_0	11.8 GB	basically full quality

⚠️ GGUF needs a recent llama.cpp — this is the gemma4_unified architecture, older builds won't load it.

⚡ Optional: free speed with MTP (lossless)

There's a tiny Gemma 4 MTP draft model in my main reasoning repo → MTP/ folder. It's the stock Gemma 4 drafter, so it pairs with any Gemma 4 12B quant — including these coder quants — for lossless speculative decoding (byte-for-byte identical output, just faster). Because it's trained on base Gemma 4, the hit-rate on this fine-tune is a bit lower than on vanilla Gemma 4, but it's free and has no downside. Add three flags (--model-draft, --spec-type draft-mtp, --n-gpu-layers-draft); see the main repo for the full command. 🏎️

📚 Training data (the interesting part 🍳)

A distillation of two complementary chain-of-thought sources over verifiable Python coding tasks (algorithmic / function-level problems with deterministic tests):

🥇 Main — Composer 2.5 real CoT. Genuine model-authored reasoning traces; each solution was run against the task's tests and only passing ones were kept. The reasoning you learn from leads to code that actually works.
🥈 Aux — Fable 5 redo. The problems where Composer 2.5 got it wrong, handed to Fable 5 to re-derive a fresh, self-consistent CoT and a correct solution — again gated on passing the tests. Recovers the hard cases the main teacher missed. These are synthetic (rationalized) CoT and are tagged separately.

Real CoT for solid coverage + synthetic "second-attempt" CoT to patch the failures — all verified by execution before training. ✅

⚠️ Good to know

Reduced refusals: task-focused training with no safety hedging, so it refuses less than the base model. It is not safety-aligned — add your own guardrails for production. Use responsibly. 🙏
Specialized for Python / algorithmic coding; general-knowledge facts/numbers should still be double-checked.
English-centric.

📚 Base & License

License: Apache 2.0. Gemma 4 is released by Google under Apache 2.0 (unlike the older Gemma 1/2/3 terms), so this fine-tune is Apache 2.0 too — free to use, modify, and redistribute. 🎉
Base model: google/gemma-4-12B-it.
Personal/hobby project — shared as-is, no warranty. Have fun, and happy hacking! 🐾✨

Downloads last month: 15

Safetensors

Model size

12B params

Tensor type

F16

Model tree for MyRayy/gemma-4-12B-coder-fable5-composer2.5-v1

Base model

google/gemma-4-12B

Finetuned

google/gemma-4-12B-it

Finetuned

(85)

this model