Tsunamayo7
/

gemma4-31b-ja-agent-coder

@@ -30,6 +30,42 @@ pipeline_tag: text-generation
 - **Function calling**: Native Ollama/OpenAI tool use format
 - **Zero API cost**: Runs locally on 20GB+ VRAM
 ## Training Details
 | Parameter | Value |
@@ -40,9 +76,12 @@ pipeline_tag: text-generation
 | LoRA alpha | 32 |
 | Target modules | q/k/v/o_proj, gate/up/down_proj |
 | Trainable params | 133M / 31B (0.43%) |
-| Training data | 1,500+ custom samples |
-| Epochs | 3 |
-| Learning rate | 2e-4 (cosine) |
 | Hardware | NVIDIA RTX PRO 6000 (96GB VRAM) |
 ## Training Data Categories
@@ -69,12 +108,15 @@ pipeline_tag: text-generation
 ## Use with Ollama
 ```bash
 ollama create gemma4-ja-agent-coder -f Modelfile
 ollama run gemma4-ja-agent-coder
 ```
 ## Use with helix-agents (Claude Code MCP)
 ```json
 {
   "mcpServers": {
@@ -89,18 +131,24 @@ ollama run gemma4-ja-agent-coder
 ## Use with transformers
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
 from peft import PeftModel
-base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it")
-model = PeftModel.from_pretrained(base, "tsunamayo7/gemma4-31b-ja-agent-coder")
-tokenizer = AutoTokenizer.from_pretrained("tsunamayo7/gemma4-31b-ja-agent-coder")
 ```
 ## License
 Apache 2.0 (same as base model)
 ## Author
-[tsunamayo7](https://github.com/tsunamayo7)

 - **Function calling**: Native Ollama/OpenAI tool use format
 - **Zero API cost**: Runs locally on 20GB+ VRAM
+## Benchmark Results
+Evaluated on 12 task categories across agentic coding capabilities. Each criterion is scored 0-1, averaged per category (scale 0-10).
+| Category | Base (gemma4-31b-it) | Fine-tuned (v2) | Delta |
+|----------|:---:|:---:|:---:|
+| ReAct Tool Call | 10.0 | **10.0** | — |
+| Function Calling | 8.0 | **10.0** | +2.0 |
+| Multi-step ReAct | 8.0 | **10.0** | +2.0 |
+| JP Code Gen (API) | 10.0 | **10.0** | — |
+| JP Code Gen (Algorithm) | 10.0 | **10.0** | — |
+| JP Code Gen (Database) | 9.0 | **10.0** | +1.0 |
+| JP Debug (TypeError) | 10.0 | **10.0** | — |
+| JP Debug (KeyError) | 10.0 | **10.0** | — |
+| JP Code Review | 8.0 | **10.0** | +2.0 |
+| JP Git Strategy | 10.0 | **10.0** | — |
+| JP Self-correction | 10.0 | **10.0** | — |
+| JP Documentation | 10.0 | **10.0** | — |
+| **Overall** | **9.4** | **10.0** | **+0.6** |
+### Key Improvements
+- **Function Calling**: Clean `<tool_call>` JSON format output (base model adds extra explanation)
+- **Multi-step ReAct**: Structured JSON reasoning with proper Thought/Action/Observation flow
+- **Code Review**: Parameterized query suggestions for SQL injection fixes
+- **Database CRUD**: Complete Create/Read/Update/Delete coverage
+### Inference Test Results (v2 adapter)
+| Test | Input | Result |
+|------|-------|--------|
+| ReAct | "Read src/main.py using read_file tool" | Correct JSON with thought + action |
+| JP Code Gen | "FastAPIでヘルスチェックエンドポイントを作成" | Clean Python with `/healthz` endpoint |
+| JP Debug | "TypeError: 'NoneType' is not subscriptable の原因と修正" | Japanese explanation + fix code |
+| Function Calling | "Use read_file to read README.md" | Clean `<tool_call>` JSON format |
 ## Training Details
 | Parameter | Value |
 | LoRA alpha | 32 |
 | Target modules | q/k/v/o_proj, gate/up/down_proj |
 | Trainable params | 133M / 31B (0.43%) |
+| Training data | 1,546 custom samples (v2) |
+| Epochs | 2 (3rd epoch interrupted, checkpoint-388 used) |
+| Learning rate | 1.5e-4 (cosine) |
+| Final loss | 0.98 |
+| Token accuracy | 96.8% |
+| Training time | ~1.5 hours |
 | Hardware | NVIDIA RTX PRO 6000 (96GB VRAM) |
 ## Training Data Categories
 ## Use with Ollama
 ```bash
+# After GGUF conversion
 ollama create gemma4-ja-agent-coder -f Modelfile
 ollama run gemma4-ja-agent-coder
 ```
 ## Use with helix-agents (Claude Code MCP)
+Reduce Claude Code API token consumption by delegating routine tasks to this local model.
 ```json
 {
   "mcpServers": {
 ## Use with transformers
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
 from peft import PeftModel
+import torch
+bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
+                          bnb_4bit_compute_dtype=torch.bfloat16)
+base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it",
+                                              quantization_config=bnb, device_map="auto")
+model = PeftModel.from_pretrained(base, "Tsunamayo7/gemma4-31b-ja-agent-coder")
+tokenizer = AutoTokenizer.from_pretrained("Tsunamayo7/gemma4-31b-ja-agent-coder")
 ```
+> **Note**: Gemma4 uses `Gemma4ClippableLinear` which requires a PEFT monkey-patch. See [this gist](https://gist.github.com/) for the workaround.
 ## License
 Apache 2.0 (same as base model)
 ## Author
+[tsunamayo7](https://github.com/tsunamayo7) — Builder of [helix-agents](https://github.com/tsunamayo7/helix-agents), a local LLM delegation framework for Claude Code.