FogGen (Gemma-3-270m, sibling-distilled): capability-floor R14 endpoint

The 270M-parameter capability-floor probe of the FogGen recipe. Sibling-distilled from the Gemma-3-1b-it buffer to install the FogGen output format, then run through the same 14-round self-evolving chain. Demonstrates the recipe pays off at deployment-grade magnitudes from roughly 0.6B upward; below that, lift becomes order-of-magnitude smaller and a sibling-distilled SFT pass is required to install the format at all.

This is a capability-floor diagnostic checkpoint, not a deployment model. The canonical deployment endpoint is issai/foggen at the 0.6B scale.

For background on the system overview, training pipeline, and routing protocol, see the issai/foggen model card.

Why this exists

Native zero-shot routing is infeasible at the 270M scale: no prompting or constrained-decoding setup we tried exceeded 54% format compliance on the FogGen output schema (the model fails to emit the Confidence:/Final answer: pattern reliably enough to extract a routing signal). We therefore probe this scale with a two-stage protocol:

  1. Sibling-distillation SFT pass: one round of SFT on the calibration buffer of the Gemma-3-1b-it sibling, using the larger model's bucket labels as targets. This installs the FogGen format on the 270M backbone.
  2. Standard 14-round chain: identical recipe to issai/foggen from there. 7 domain rotation, LoRA r=16 α=32, bf16, 2 epochs/round, same cloud teacher.

The released checkpoint is R14 of the post-distillation chain.

Performance

System accuracy at Ï„=0.5 on the seven MCQ domains (full test sets, ~16,200 queries). Cloud baseline is Qwen3-30B-A3B-Instruct-2507.

Domain Cloud only R14 raw Random @ Ï„=0.5 FogGen @ Ï„=0.5 Cloud routed
Finance 69.5% 32.2% 58.2% 60.2% 69.5%
Science 72.7% 30.4% 58.2% 59.5% 65.6%
Coding 74.2% 34.3% 64.7% 65.7% 76.3%
Law 70.7% 31.7% 58.5% 59.7% 68.7%
Math 60.1% 24.5% 58.3% 58.5% 94.9%
Kazakh culture 95.8% 43.7% 60.3% 59.3% 31.9%
Medical 74.0% 32.2% 59.8% 60.8% 65.9%
Mean 73.9% 32.7% 59.7% 60.5% 67.5%

Mean lift over Random at Ï„=0.5: +0.8 (positive on six of seven domains; negative on Kazakh culture, the headroom-collapse domain).

Compared to issai/foggen (+4.6 at 0.6B) and issai/foggen-gemma3-1b (+5.9 at 1B), the lift here is an order of magnitude smaller. The recipe still produces positive lift, but the magnitude scales sharply with edge capacity below the 0.6B mark.

Quick demo

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("issai/foggen-gemma3-270m", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("issai/foggen-gemma3-270m")

SYSTEM = """You are a self-aware multiple-choice assistant.

Rules:
- First, assess your confidence in solving this question.
- Then give your answer.
- Output format:
  Confidence: <0.0|0.25|0.5|0.75|1.0>
  Final answer: <OPTION_LETTER>"""

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "<your MCQ here>"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=64, do_sample=False)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

The routing decision (route_query helper, threshold Ï„) is identical to the issai/foggen card.

License

Inherits the Gemma Terms of Use from google/gemma-3-270m.

Citation

Paper coming soon.

Downloads last month
12
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for issai/foggen-gemma3-270m

Finetuned
(140)
this model

Datasets used to train issai/foggen-gemma3-270m

Collection including issai/foggen-gemma3-270m