Instructions to use naazimsnh02/FabGemma with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use naazimsnh02/FabGemma with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="naazimsnh02/FabGemma")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("naazimsnh02/FabGemma")
model = AutoModelForMultimodalLM.from_pretrained("naazimsnh02/FabGemma")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use naazimsnh02/FabGemma with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "naazimsnh02/FabGemma"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "naazimsnh02/FabGemma",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/naazimsnh02/FabGemma

SGLang

How to use naazimsnh02/FabGemma with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "naazimsnh02/FabGemma" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "naazimsnh02/FabGemma",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "naazimsnh02/FabGemma" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "naazimsnh02/FabGemma",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use naazimsnh02/FabGemma with Docker Model Runner:
```
docker model run hf.co/naazimsnh02/FabGemma
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

FabGemma

FabGemma-12B

FabGemma-12B is an advanced, reasoning-first optimization of Google's Gemma 4 12B Instruct. It has been specifically fine-tuned to inject advanced agentic coding, autonomous task planning, and rigorous debugging workflows into the base model's standard instruction-following capabilities.

By utilizing supervised fine-tuning (SFT) on complex agentic traces, this model learns a crucial habit: it reasons and plans before it acts.

Core Highlights

Brain Upgrades: Modeled after complex, multi-step debugging and tool-use reasoning paths.
Base Architecture: google/gemma-4-12B-it (Dense Transformer).
Massive Context: Inherits Gemma 4's native 256K token context window.
Efficiency First: Trained using LoRA (merged directly into the final weights), modifying just 2.15% (~262M parameters) of the total network.

The Recipe: Dataset & Structure

FabGemma-12B was trained on 15.2 million tokens distilled directly from high-tier coding agent sessions.

Primary Source: Glint-Research/Fable-5-traces (4,665 total examples)
Targeting: Loss is selectively computed only on assistant completion tokens.

Dataset Characteristics

Attribute	Metrics & Distribution
Total Examples	4,665 (with 100 held out for evaluation)
Average Sequence Length	~3.3K tokens
P99 Sequence Length	~9.2K tokens
Maximum Sequence Length	~24.9K tokens
Behavioral Mix	81% Tool-use interactions / 19% Direct text responses

Generative Framework

The model organizes its outputs into clear, cognitive steps. It will typically isolate its thought process using explicit XML-style formatting:

<think>
[Step-by-step problem dissection, edge-case identification, and tool strategy]
</think>

ASSISTANT (tool call) <Tool> input={...}

Training Blueprint

The fine-tuning phase utilized Unsloth, TRL, Transformers, and PEFT with the following configuration:

LoRA Configurations

Rank (r): 64
Alpha ($\alpha$): 128
Dropout: 0
Target Modules: q, k, v, o, gate, up, down

Optimization Passages

Epochs: 2
Learning Rate: 1e-4 (via Cosine Scheduler, 3% Warmup)
Effective Batch Size: 16
Training Sequence Cap: 16,384 tokens
Precision & Optimizer: bf16 utilizing AdamW (Weight decay: 0.01)

Evaluation & Performance

Validation metrics showed steady improvement across training epochs without any signs of degradation or collapse.

Final Training Loss: ~0.096
Validation Loss (Epoch 1): 0.785
Validation Loss (Epoch 2): 0.756

Benchmark Comparison (100 Held-Out Coding Traces)

When stacked against its own base model on 105,525 unseen response tokens, FabGemma-12B showed massive efficiency leaps in agentic workflows:

Performance Metric	Base Model (`gemma-4-12B-it`)	FabGemma-12B	Net Improvement
Evaluation Loss	1.580	0.737	−53.4%
Perplexity	4.856	2.089	−57.0%
Mean Per-Example Loss	1.747	0.760	−56.5%

Quickstart Implementation

You can pull and deploy the merged checkpoint directly using Hugging Face transformers:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "naazimsnh02/FabGemma-12B"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

messages = [{"role": "user", "content":
    "USER: There's a failing test test_auth.py::test_expired_token. Investigate why and propose a fix."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

out = model.generate(inputs, max_new_tokens=512, do_sample=True,
                     temperature=0.7, top_p=0.9, repetition_penalty=1.05)  # rep-penalty avoids loops
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

Important Limitations

Before dropping this model straight into a production pipeline, keep these architectural realities in mind:

Specialized Focus: Performance is heavily optimized for code architecture, script execution planning, and debugging. General trivia or encyclopedic factual knowledge may not match its engineering performance.
Modality Restraints: This is a strictly text-to-text asset. Core vision or audio capabilities have not been adapted.
Language & Formatting: Fine-tuning was executed primarily on English-centric environments. Output syntax remains highly dependent on user prompt structure.
Inherited Elements: Safety baselines, core biases, and underlying assumptions are inherited directly from the original google/gemma-4-12B-it foundation. Always vet code outputs before execution.

Provenance, Credits, & Licensing

Base Weights: Google Gemma Team (Gemma License)
Dataset Credits: Glint-Research/Fable-5-traces (AGPL-3.0)
Compliance Reminder: Because the training dataset is distilled from alternative AI assistant session logs, downstream practitioners must verify that their integration aligns with all relevant provider terms regarding derivative model training.

Disclaimer: This model checkpoint is experimental and provided "as-is" for research, local testing, and collaborative evaluation. There are no operational warranties attached to its outputs.

Downloads last month: 13

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for naazimsnh02/FabGemma

Base model

google/gemma-4-12B

Finetuned

google/gemma-4-12B-it