Instructions to use naazimsnh02/FabGemma with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use naazimsnh02/FabGemma with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="naazimsnh02/FabGemma") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("naazimsnh02/FabGemma") model = AutoModelForMultimodalLM.from_pretrained("naazimsnh02/FabGemma") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use naazimsnh02/FabGemma with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "naazimsnh02/FabGemma" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naazimsnh02/FabGemma", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/naazimsnh02/FabGemma
- SGLang
How to use naazimsnh02/FabGemma with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "naazimsnh02/FabGemma" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naazimsnh02/FabGemma", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "naazimsnh02/FabGemma" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naazimsnh02/FabGemma", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use naazimsnh02/FabGemma with Docker Model Runner:
docker model run hf.co/naazimsnh02/FabGemma
FabGemma-12B
FabGemma-12B is an advanced, reasoning-first optimization of Google's Gemma 4 12B Instruct. It has been specifically fine-tuned to inject advanced agentic coding, autonomous task planning, and rigorous debugging workflows into the base model's standard instruction-following capabilities.
By utilizing supervised fine-tuning (SFT) on complex agentic traces, this model learns a crucial habit: it reasons and plans before it acts.
Core Highlights
- Brain Upgrades: Modeled after complex, multi-step debugging and tool-use reasoning paths.
- Base Architecture:
google/gemma-4-12B-it(Dense Transformer). - Massive Context: Inherits Gemma 4's native 256K token context window.
- Efficiency First: Trained using LoRA (merged directly into the final weights), modifying just 2.15% (~262M parameters) of the total network.
The Recipe: Dataset & Structure
FabGemma-12B was trained on 15.2 million tokens distilled directly from high-tier coding agent sessions.
- Primary Source:
Glint-Research/Fable-5-traces(4,665 total examples) - Targeting: Loss is selectively computed only on assistant completion tokens.
Dataset Characteristics
| Attribute | Metrics & Distribution |
|---|---|
| Total Examples | 4,665 (with 100 held out for evaluation) |
| Average Sequence Length | ~3.3K tokens |
| P99 Sequence Length | ~9.2K tokens |
| Maximum Sequence Length | ~24.9K tokens |
| Behavioral Mix | 81% Tool-use interactions / 19% Direct text responses |
Generative Framework
The model organizes its outputs into clear, cognitive steps. It will typically isolate its thought process using explicit XML-style formatting:
<think>
[Step-by-step problem dissection, edge-case identification, and tool strategy]
</think>
ASSISTANT (tool call) <Tool> input={...}
Training Blueprint
The fine-tuning phase utilized Unsloth, TRL, Transformers, and PEFT with the following configuration:
LoRA Configurations
- Rank (r): 64
- Alpha ($\alpha$): 128
- Dropout: 0
- Target Modules:
q,k,v,o,gate,up,down
Optimization Passages
- Epochs: 2
- Learning Rate: 1e-4 (via Cosine Scheduler, 3% Warmup)
- Effective Batch Size: 16
- Training Sequence Cap: 16,384 tokens
- Precision & Optimizer:
bf16utilizingAdamW(Weight decay: 0.01)
Evaluation & Performance
Validation metrics showed steady improvement across training epochs without any signs of degradation or collapse.
- Final Training Loss: ~0.096
- Validation Loss (Epoch 1): 0.785
- Validation Loss (Epoch 2): 0.756
Benchmark Comparison (100 Held-Out Coding Traces)
When stacked against its own base model on 105,525 unseen response tokens, FabGemma-12B showed massive efficiency leaps in agentic workflows:
| Performance Metric | Base Model (gemma-4-12B-it) |
FabGemma-12B | Net Improvement |
|---|---|---|---|
| Evaluation Loss | 1.580 | 0.737 | −53.4% |
| Perplexity | 4.856 | 2.089 | −57.0% |
| Mean Per-Example Loss | 1.747 | 0.760 | −56.5% |
Quickstart Implementation
You can pull and deploy the merged checkpoint directly using Hugging Face transformers:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "naazimsnh02/FabGemma-12B"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
messages = [{"role": "user", "content":
"USER: There's a failing test test_auth.py::test_expired_token. Investigate why and propose a fix."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=True,
temperature=0.7, top_p=0.9, repetition_penalty=1.05) # rep-penalty avoids loops
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
Important Limitations
Before dropping this model straight into a production pipeline, keep these architectural realities in mind:
- Specialized Focus: Performance is heavily optimized for code architecture, script execution planning, and debugging. General trivia or encyclopedic factual knowledge may not match its engineering performance.
- Modality Restraints: This is a strictly text-to-text asset. Core vision or audio capabilities have not been adapted.
- Language & Formatting: Fine-tuning was executed primarily on English-centric environments. Output syntax remains highly dependent on user prompt structure.
- Inherited Elements: Safety baselines, core biases, and underlying assumptions are inherited directly from the original
google/gemma-4-12B-itfoundation. Always vet code outputs before execution.
Provenance, Credits, & Licensing
- Base Weights: Google Gemma Team (
Gemma License) - Dataset Credits: Glint-Research/Fable-5-traces (
AGPL-3.0) - Compliance Reminder: Because the training dataset is distilled from alternative AI assistant session logs, downstream practitioners must verify that their integration aligns with all relevant provider terms regarding derivative model training.
Disclaimer: This model checkpoint is experimental and provided "as-is" for research, local testing, and collaborative evaluation. There are no operational warranties attached to its outputs.
- Downloads last month
- 13