Model intro + feedback thread

#2
by ArnavKewalram - opened

Introduction

gemma-4-E2B-coder-v1 is the first coding fine-tune of google/gemma-4-E2B-it ? Google's 3.9B hybrid Griffin model (alternating attention + linear recurrent layers).

Why this model?

Most small coding models target Llama or Mistral architectures. Gemma 4 E2B is different:

  • Griffin architecture: faster CPU inference than pure-transformer models of the same size
  • Compact size: ~3.2 GB at Q4_K_M ? fits on a Raspberry Pi 5 with enough swap
  • Apache 2.0: commercial use without restrictions

Training

Trained on 10,000 samples from Magicoder-OSS-Instruct-75K ? real GitHub code instruction pairs ? using QLoRA on a single RTX 3080, in ~6 hours.

Eval (keyword score, Q4_K_M, 8 tasks): 88.5% average across Python, JS, Go, Rust, SQL, Bash, C++.

Try it

  • Live demo ? runs on ZeroGPU, no setup needed
  • Ollama: ollama run hf.co/ArnavKewalram/gemma-4-E2B-coder-v1:Q4_K_M
  • Colab notebook: linked in the model card

Feedback welcome

If you try this model, I'd love to know:

  1. What languages / tasks work well vs. where it struggles?
  2. Does it perform better or worse than other ~4B coders you've tried?

Known limitations: training max was 384 tokens, so very long prompts may produce truncated or lower-quality output.

Sign up or log in to comment