Instructions to use autotrust/gemma4-31B-Fable-5-Distilled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use autotrust/gemma4-31B-Fable-5-Distilled with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="autotrust/gemma4-31B-Fable-5-Distilled") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("autotrust/gemma4-31B-Fable-5-Distilled") model = AutoModelForMultimodalLM.from_pretrained("autotrust/gemma4-31B-Fable-5-Distilled") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Local Apps Settings
- vLLM
How to use autotrust/gemma4-31B-Fable-5-Distilled with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "autotrust/gemma4-31B-Fable-5-Distilled" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "autotrust/gemma4-31B-Fable-5-Distilled", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/autotrust/gemma4-31B-Fable-5-Distilled
- SGLang
How to use autotrust/gemma4-31B-Fable-5-Distilled with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "autotrust/gemma4-31B-Fable-5-Distilled" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "autotrust/gemma4-31B-Fable-5-Distilled", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "autotrust/gemma4-31B-Fable-5-Distilled" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "autotrust/gemma4-31B-Fable-5-Distilled", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use autotrust/gemma4-31B-Fable-5-Distilled with Docker Model Runner:
docker model run hf.co/autotrust/gemma4-31B-Fable-5-Distilled
🚀 AutoTrust AI Lab — Open Model Release Wave #2: Gemma-4-31B Fable-5 Distilled (Full Precision + GGUF)
Hello Hugging Face community,
Following the reception of our first open release — gpt-oss-120b-Fable-5-Distilled-GGUF crossed 899 downloads in its first three days, with strong engagement from the llama.cpp and MLX communities — we're now releasing two new models in the same family:
🔹 autotrust/gemma4-31B-Fable-5-Distilled — full-precision BF16 (31B parameters, multimodal, tool-use, reasoning)
🔹 autotrust/gemma4-31B-Fable-5-Distilled-GGUF — F16 / Q8_0 GGUF variants with multimodal projector for local inference
Highlights
- HumanEval pass@1: 92.7% (152/164) — a +15.9 point lift over Google's official 76.8% on the base gemma-4-31B-it, with thinking disabled at T=0.1.
- Multimodal vision preserved. We freeze layers 0–29 (visual fusion) and apply QLoRA only to layers 30–59 (language head). Vision benchmarks remain at parity with the base model — most coding fine-tunes destroy this.
- Native tool-use and thinking modes trained on 308 high-quality Fable-5 agentic coding traces. We chose quality-first curation: 23K raw conversations distilled to 308 complete, verified tool-use sessions.
- GGUF variant ships with mmproj-gemma4-31b-Fable-5-F16.gguf so you can run text + vision locally on llama.cpp, LM Studio, Ollama, or Jan.
Why this release
AutoTrust AI Lab is building open foundation models optimized for agentic coding and scientific research workflows. These releases are the open-weight layer underneath our PaperGuru and upcoming ScienceGuru research agents. Distillation and post-training pipelines are led by Cloud Yu (Chief AI Architect, AutoTrust AI Lab), with Dr. Daniel Tang, CEO and Prof. Tegawendé F. Bissyandé (Chief Scientist) leading the broader research agenda.
What's next
- Q5_0 and Q4_K_M GGUF quants for consumer hardware (16–24 GB VRAM target) — within 7 days.
- vLLM and SGLang deployment guides + reference Dockerfiles.
- We're pursuing inference provider support (OpenRouter, Together, Fireworks, Novita).
- A technical report covering the layer-freezing strategy, prompt-loss-masking ablations, and our MLX MXFP4 → GGUF conversion pipeline (the latter is the first public implementation we're aware of) is in preparation.
We'd love your help
- Try it. ollama run hf.co/autotrust/gemma4-31B-Fable-5-Distilled-GGUF:F16 or load it in LM Studio.
- Tell us where it breaks. File issues here in the Community tab — we read every one.
- Benchmark it on your own evals. We're particularly interested in SWE-bench Lite, MBPP, and any multimodal tool-use evals you run.
- Share quantizations. If you produce Q4_K_M, IQ3_XXS, or imatrix quants, link them — we'll feature them on the model card.
Thanks to @cloudyu for the training and conversion work, to @armand0e for the Fable-5 dataset, and to the broader llama.cpp, MLX, and Hugging Face teams whose infrastructure makes work like this possible.
— AutoTrust AI Lab
🌐 autotrust.ai · 🤗 huggingface.co/autotrust · 📧 andy@autotrust.ai