Instructions to use Alfaxad/wild-gemma-4-E4B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Alfaxad/wild-gemma-4-E4B-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Alfaxad/wild-gemma-4-E4B-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Alfaxad/wild-gemma-4-E4B-it") model = AutoModelForImageTextToText.from_pretrained("Alfaxad/wild-gemma-4-E4B-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Alfaxad/wild-gemma-4-E4B-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Alfaxad/wild-gemma-4-E4B-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Alfaxad/wild-gemma-4-E4B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Alfaxad/wild-gemma-4-E4B-it
- SGLang
How to use Alfaxad/wild-gemma-4-E4B-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Alfaxad/wild-gemma-4-E4B-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Alfaxad/wild-gemma-4-E4B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Alfaxad/wild-gemma-4-E4B-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Alfaxad/wild-gemma-4-E4B-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Alfaxad/wild-gemma-4-E4B-it with Docker Model Runner:
docker model run hf.co/Alfaxad/wild-gemma-4-E4B-it
Wild Gemma 4 E4B IT
Wild Gemma 4 E4B IT is the Savanna Sentinel fine-tune of Gemma 4 E4B IT for structured wildlife monitoring from Serengeti camera-trap events. It is trained to read one to three camera-trap frames plus event metadata and return machine-readable JSON for species/event interpretation, review routing, and downstream reporting workflows.
This repository contains the corrected merged Hugging Face model. The LoRA adapter was trained with Unsloth, then manually merged into the Gemma 4 E4B IT base weights after the automated merged artifact failed smoke tests. The corrected merge was re-evaluated before publication.
Model Lineage
- Base family: Gemma 4
- Base instruction model:
unsloth/gemma-4-E4B-it, derived fromgoogle/gemma-4-E4B-it - Fine-tune method: LoRA supervised fine-tuning
- Merge method: manual LoRA merge into language model linear modules,
W += (B @ A) * alpha / r - Merged modules: 406
- Missing LoRA merge modules: 0
- Output format: Hugging Face Transformers safetensors
- Companion Ollama/GGUF export:
Alfaxad/wild-gemma-4-E4B-it-GGUF
Gemma 4 E4B is a dense multimodal model with text, image, and audio support in the base model family. This Savanna Sentinel release was evaluated for image+text camera-trap inference; audio behavior was not evaluated for this project.
Intended Use
This model is intended for open wildlife-monitoring demos, research prototypes, and production experiments around camera-trap event triage.
Primary input:
- One to three camera-trap images from a capture burst
- Event metadata such as camera site, timestamp, environmental features, split/task context, and prompt-specific instructions
Primary output:
- Strict JSON matching the Savanna Sentinel task schemas
Primary tasks:
- Phase 1 event interpretation: blank/non-blank, species, count bin, behavior, young-present signal, confidence, and review fields
- Phase 2 review routing: structured review decision, reasons, label uncertainty, disagreement signals, and triage notes
- Phase 3 report/tool tasks: structured tool-call style outputs and report JSON for biodiversity monitoring workflows
Dataset
Fine-tuned on Alfaxad/wildlife-sentinel.
The dataset was generated for Savanna Sentinel from:
- Snapshot Serengeti camera-trap event images and labels
- Snapshot Serengeti consensus, expert gold, raw vote, search-effort, and image metadata artifacts
- Public environmental features joined at event/site/month level, including MODIS vegetation features and related public environmental layers
- Generated Savanna Sentinel task schemas for event interpretation, review routing, tool-agent planning, and reporting
Training package used by the run:
| Split group | Rows |
|---|---|
| Train rows | 38,612 |
| Validation rows | 672 |
| Total audited rows | 39,284 |
Training rows by task:
| Task | Rows |
|---|---|
| Phase 1 event interpretation | 17,496 |
| Phase 2 review routing | 17,496 |
| Phase 3 tool-agent planning | 2,220 |
| Phase 3 report generation | 1,400 |
Validation rows by task:
| Task | Rows |
|---|---|
| Phase 1 event interpretation | 256 |
| Phase 2 review routing | 256 |
| Phase 3 tool-agent planning | 80 |
| Phase 3 report generation | 80 |
The training examples use chat-style multimodal messages. Image content is placed before text content, matching Gemma 4 and Ollama multimodal best practice.
Training Configuration Snapshot
| Metric | Value |
|---|---|
| Epochs | 1.0 |
| Train loss | 0.007153 |
| Train runtime | 35,587.66 seconds |
| Samples/sec | 1.085 |
| Steps/sec | 0.136 |
| Peak CUDA reserved | 13.498 GB |
| Chosen max length | 8192 |
| Max audited text tokens, before visual tokens | 1754 |
| P99 audited text tokens, before visual tokens | 1653 |
| Max images per example | 3 |
max_length=8192 was selected to leave room for up to three image frames after the chat-template text tokens.
Evaluation
The metrics below are diagnostic evaluations after the merge/export fix. They are useful for regression checking and comparison between base, merged, and Ollama-exported variants, but they should not be treated as a final benchmark.
Base vs Fine-Tuned
| Model | Mode | Rows | JSON valid | Species exact | Species overlap | Blank correct | Review correct |
|---|---|---|---|---|---|---|---|
| HF base Gemma 4 E4B IT | non-thinking | 40 | 0.800 | 0.222 | 0.222 | 0.000 | n/a |
| Wild Gemma HF | non-thinking | 40 | 0.775 | 0.273 | 0.273 | 0.818 | 1.000 |
| Wild Gemma HF | thinking | 24 | 0.917 | 0.667 | 0.667 | 0.800 | 1.000 |
| Official Ollama Gemma 4 E4B | non-thinking | 24 | 0.917 | 0.333 | 0.333 | 0.000 | n/a |
| Wild Gemma Ollama/GGUF | non-thinking | 40 | 0.725 | 0.364 | 0.364 | 0.889 | 1.000 |
| Wild Gemma Ollama/GGUF | thinking | 24 | 0.792 | 0.500 | 0.500 | 1.000 | 1.000 |
The fine-tune substantially improves blank-event handling and review-routing behavior relative to the base diagnostic runs. Thinking mode improved the HF merged model on the small diagnostic species subset.
Metrics Files
The full run artifacts are included under metrics/:
train_metrics.jsondataset_runtime_stats.jsondataset_token_length_audit.jsonmerge_manual_lora_status.jsonmerge_manual_lora_smoke.jsonevaluation_base.jsonevaluation_finetuned_adapter_redo_diagnostic.jsonevaluation_ollama_manual_combined_q4_officialmeta_redo.json- prediction JSONL files for base, fine-tuned, and Ollama diagnostic runs
Usage
Use the Gemma 4 chat template from the tokenizer/processor. For multimodal prompts, place image content before text content. Ask for strict JSON only, and validate the response against the application schema before using it in production.
Example prompt intent:
Classify this Serengeti camera-trap capture event. Use the images first, then the metadata. Return only valid JSON matching savanna_sentinel_event_v1.
Recommended generation defaults for parity with the evaluation setup:
temperature = 1.0
top_p = 0.95
top_k = 64
For deterministic regression testing, use fixed seeds and task-specific max generation limits.
Thinking Mode
Gemma 4 supports configurable thinking. For this model:
- Non-thinking mode is best when you need short, schema-only JSON outputs.
- Thinking mode can improve difficult visual/species reasoning, but the final answer still needs JSON extraction and validation.
- When thinking is enabled, do not feed previous hidden/thought content back into later turns. Multi-turn history should include only final assistant responses.
In runtimes that expose Gemma 4 thinking through the chat template, enable thinking with the runtime flag or by placing <|think|> at the start of the system prompt. For schema production, strip any thought channel content and keep only the final JSON.
Output Schemas
The main event interpretation target follows this shape:
{
"schema_version": "savanna_sentinel_event_v1",
"capture_event_id": "ASG...",
"blank": false,
"detections": [
{
"species": "zebra",
"count_bin": "3",
"behaviors": {
"standing": false,
"resting": false,
"moving": true,
"eating": false,
"interacting": false
},
"young_present": false,
"confidence": "high",
"evidence": {
"visual_basis": "Striped equids visible across the image burst.",
"frames_used": [1, 2, 3]
}
}
],
"review": {
"review_needed": false,
"reasons": []
}
}
Production callers should treat model output as untrusted text until JSON parsing and schema validation succeed.
Limitations
- Diagnostic evals are small and task-specific; run a broader benchmark before making scientific claims.
- Species predictions are not a replacement for expert review in high-stakes ecological analysis.
- The model can produce malformed JSON, especially on tool/report tasks and some Ollama quantized runs.
- The model was fine-tuned for Savanna Sentinel camera-trap workflows and may not generalize to unrelated wildlife imagery, camera systems, geographies, or taxonomies.
- Audio support from the base family was not evaluated for this release.
- Environmental fields are useful context, but the model should not be used as a causal ecological model.
Citation And Attribution
Please cite the upstream model and data sources when using this model:
- Gemma 4 E4B IT by Google DeepMind:
google/gemma-4-E4B-it - Unsloth Gemma 4 E4B IT training base:
unsloth/gemma-4-E4B-it - Savanna Sentinel dataset package:
Alfaxad/wildlife-sentinel - Snapshot Serengeti source dataset and associated metadata used in the dataset package
Related Artifacts
- Dataset: https://huggingface.co/datasets/Alfaxad/wildlife-sentinel
- GGUF/Ollama export: https://huggingface.co/Alfaxad/wild-gemma-4-E4B-it-GGUF
- Base model card: https://huggingface.co/google/gemma-4-E4B-it
- Ollama Gemma 4 E4B reference: https://ollama.com/library/gemma4:e4b
- Downloads last month
- 149