Instructions to use issai/foggen-gemma3-270m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use issai/foggen-gemma3-270m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="issai/foggen-gemma3-270m") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("issai/foggen-gemma3-270m") model = AutoModelForCausalLM.from_pretrained("issai/foggen-gemma3-270m") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use issai/foggen-gemma3-270m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "issai/foggen-gemma3-270m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "issai/foggen-gemma3-270m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/issai/foggen-gemma3-270m
- SGLang
How to use issai/foggen-gemma3-270m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "issai/foggen-gemma3-270m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "issai/foggen-gemma3-270m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "issai/foggen-gemma3-270m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "issai/foggen-gemma3-270m", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use issai/foggen-gemma3-270m with Docker Model Runner:
docker model run hf.co/issai/foggen-gemma3-270m
FogGen (Gemma-3-270m, sibling-distilled): capability-floor R14 endpoint
The 270M-parameter capability-floor probe of the FogGen recipe. Sibling-distilled from the Gemma-3-1b-it buffer to install the FogGen output format, then run through the same 14-round self-evolving chain. Demonstrates the recipe pays off at deployment-grade magnitudes from roughly 0.6B upward; below that, lift becomes order-of-magnitude smaller and a sibling-distilled SFT pass is required to install the format at all.
This is a capability-floor diagnostic checkpoint, not a deployment model. The canonical deployment endpoint is issai/foggen at the 0.6B scale.
For background on the system overview, training pipeline, and routing protocol, see the issai/foggen model card.
Why this exists
Native zero-shot routing is infeasible at the 270M scale: no prompting or constrained-decoding setup we tried exceeded 54% format compliance on the FogGen output schema (the model fails to emit the Confidence:/Final answer: pattern reliably enough to extract a routing signal). We therefore probe this scale with a two-stage protocol:
- Sibling-distillation SFT pass: one round of SFT on the calibration buffer of the Gemma-3-1b-it sibling, using the larger model's bucket labels as targets. This installs the FogGen format on the 270M backbone.
- Standard 14-round chain: identical recipe to
issai/foggenfrom there. 7 domain rotation, LoRA r=16 α=32, bf16, 2 epochs/round, same cloud teacher.
The released checkpoint is R14 of the post-distillation chain.
Performance
System accuracy at Ï„=0.5 on the seven MCQ domains (full test sets, ~16,200 queries). Cloud baseline is Qwen3-30B-A3B-Instruct-2507.
| Domain | Cloud only | R14 raw | Random @ Ï„=0.5 | FogGen @ Ï„=0.5 | Cloud routed |
|---|---|---|---|---|---|
| Finance | 69.5% | 32.2% | 58.2% | 60.2% | 69.5% |
| Science | 72.7% | 30.4% | 58.2% | 59.5% | 65.6% |
| Coding | 74.2% | 34.3% | 64.7% | 65.7% | 76.3% |
| Law | 70.7% | 31.7% | 58.5% | 59.7% | 68.7% |
| Math | 60.1% | 24.5% | 58.3% | 58.5% | 94.9% |
| Kazakh culture | 95.8% | 43.7% | 60.3% | 59.3% | 31.9% |
| Medical | 74.0% | 32.2% | 59.8% | 60.8% | 65.9% |
| Mean | 73.9% | 32.7% | 59.7% | 60.5% | 67.5% |
Mean lift over Random at Ï„=0.5: +0.8 (positive on six of seven domains; negative on Kazakh culture, the headroom-collapse domain).
Compared to issai/foggen (+4.6 at 0.6B) and issai/foggen-gemma3-1b (+5.9 at 1B), the lift here is an order of magnitude smaller. The recipe still produces positive lift, but the magnitude scales sharply with edge capacity below the 0.6B mark.
Quick demo
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("issai/foggen-gemma3-270m", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("issai/foggen-gemma3-270m")
SYSTEM = """You are a self-aware multiple-choice assistant.
Rules:
- First, assess your confidence in solving this question.
- Then give your answer.
- Output format:
Confidence: <0.0|0.25|0.5|0.75|1.0>
Final answer: <OPTION_LETTER>"""
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "<your MCQ here>"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=64, do_sample=False)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
The routing decision (route_query helper, threshold Ï„) is identical to the issai/foggen card.
License
Inherits the Gemma Terms of Use from google/gemma-3-270m.
Citation
Paper coming soon.
- Downloads last month
- 12
Model tree for issai/foggen-gemma3-270m
Base model
google/gemma-3-270m