Instructions to use issai/foggen-gemma3-270m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use issai/foggen-gemma3-270m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="issai/foggen-gemma3-270m")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("issai/foggen-gemma3-270m")
model = AutoModelForCausalLM.from_pretrained("issai/foggen-gemma3-270m")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use issai/foggen-gemma3-270m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "issai/foggen-gemma3-270m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "issai/foggen-gemma3-270m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/issai/foggen-gemma3-270m

SGLang

How to use issai/foggen-gemma3-270m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "issai/foggen-gemma3-270m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "issai/foggen-gemma3-270m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "issai/foggen-gemma3-270m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "issai/foggen-gemma3-270m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use issai/foggen-gemma3-270m with Docker Model Runner:
```
docker model run hf.co/issai/foggen-gemma3-270m
```

FogGen (Gemma-3-270m, sibling-distilled): capability-floor R14 endpoint

The 270M-parameter capability-floor probe of the FogGen recipe. Sibling-distilled from the Gemma-3-1b-it buffer to install the FogGen output format, then run through the same 14-round self-evolving chain. Demonstrates the recipe pays off at deployment-grade magnitudes from roughly 0.6B upward; below that, lift becomes order-of-magnitude smaller and a sibling-distilled SFT pass is required to install the format at all.

This is a capability-floor diagnostic checkpoint, not a deployment model. The canonical deployment endpoint is issai/foggen at the 0.6B scale.

For background on the system overview, training pipeline, and routing protocol, see the issai/foggen model card.

Why this exists

Native zero-shot routing is infeasible at the 270M scale: no prompting or constrained-decoding setup we tried exceeded 54% format compliance on the FogGen output schema (the model fails to emit the Confidence:/Final answer: pattern reliably enough to extract a routing signal). We therefore probe this scale with a two-stage protocol:

Sibling-distillation SFT pass: one round of SFT on the calibration buffer of the Gemma-3-1b-it sibling, using the larger model's bucket labels as targets. This installs the FogGen format on the 270M backbone.
Standard 14-round chain: identical recipe to issai/foggen from there. 7 domain rotation, LoRA r=16 α=32, bf16, 2 epochs/round, same cloud teacher.

The released checkpoint is R14 of the post-distillation chain.

Performance

System accuracy at τ=0.5 on the seven MCQ domains (full test sets, ~16,200 queries). Cloud baseline is Qwen3-30B-A3B-Instruct-2507.

Domain	Cloud only	R14 raw	Random @ τ=0.5	FogGen @ τ=0.5	Cloud routed
Finance	69.5%	32.2%	58.2%	60.2%	69.5%
Science	72.7%	30.4%	58.2%	59.5%	65.6%
Coding	74.2%	34.3%	64.7%	65.7%	76.3%
Law	70.7%	31.7%	58.5%	59.7%	68.7%
Math	60.1%	24.5%	58.3%	58.5%	94.9%
Kazakh culture	95.8%	43.7%	60.3%	59.3%	31.9%
Medical	74.0%	32.2%	59.8%	60.8%	65.9%
Mean	73.9%	32.7%	59.7%	60.5%	67.5%

Mean lift over Random at τ=0.5: +0.8 (positive on six of seven domains; negative on Kazakh culture, the headroom-collapse domain).

Compared to issai/foggen (+4.6 at 0.6B) and issai/foggen-gemma3-1b (+5.9 at 1B), the lift here is an order of magnitude smaller. The recipe still produces positive lift, but the magnitude scales sharply with edge capacity below the 0.6B mark.

Quick demo

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("issai/foggen-gemma3-270m", torch_dtype="bfloat16", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("issai/foggen-gemma3-270m")

SYSTEM = """You are a self-aware multiple-choice assistant.

Rules:
- First, assess your confidence in solving this question.
- Then give your answer.
- Output format:
  Confidence: <0.0|0.25|0.5|0.75|1.0>
  Final answer: <OPTION_LETTER>"""

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "<your MCQ here>"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=64, do_sample=False)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

The routing decision (route_query helper, threshold τ) is identical to the issai/foggen card.

License

Inherits the Gemma Terms of Use from google/gemma-3-270m.

Citation

Paper coming soon.

Downloads last month: 12

Safetensors

Model size

0.3B params

Tensor type

BF16

Model tree for issai/foggen-gemma3-270m

Base model

google/gemma-3-270m

Finetuned

(140)

this model

Datasets used to train issai/foggen-gemma3-270m

Collection including issai/foggen-gemma3-270m

FogGen — Self-Aware Edge-Cloud LLM Router

Collection

A 0.6B edge LLM that emits calibrated verbalized confidence for edge-cloud routing. Models + training/eval data. • 6 items • Updated 2 days ago