Instructions to use OsaurusAI/gemma-4-12B-it-JANG_4M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OsaurusAI/gemma-4-12B-it-JANG_4M with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("OsaurusAI/gemma-4-12B-it-JANG_4M") config = load_config("OsaurusAI/gemma-4-12B-it-JANG_4M") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use OsaurusAI/gemma-4-12B-it-JANG_4M with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/gemma-4-12B-it-JANG_4M"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "OsaurusAI/gemma-4-12B-it-JANG_4M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use OsaurusAI/gemma-4-12B-it-JANG_4M with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/gemma-4-12B-it-JANG_4M"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default OsaurusAI/gemma-4-12B-it-JANG_4M
Run Hermes
hermes
Gemma 4 12B-it - JANG_4M
Apple Silicon MLX bundle for Osaurus and compatible vMLX runtimes.
Important update (2026-06-03 4:06 PM PDT): These weights were rebuilt with the verified Gemma 4 12B fix. If you downloaded this repository before 2026-06-03 4:06 PM PDT, delete the local copy and re-download.
Model Details
| Property | Value |
|---|---|
| Base model | google/gemma-4-12B-it |
| Architecture | Gemma 4 unified dense 12B, text + image/audio/video-capable metadata |
| Format | MLX safetensors |
| Quantization | JANG mixed precision: attention 8-bit, MLP 4-bit, group size 32; tied embedding and multimodal embedders fp16 passthrough |
| Tied token embedding | fp16 passthrough (embed_tokens.weight is not quantized) |
| Multimodal embedders | fp16 passthrough |
| Package size | 10.17 GB |
| Shards | 10 safetensors shards |
| Chat template | Gemma 4 tool-aware template, no default no-thinking thought-channel tail |
Runtime Notes
These rebuilt bundles preserve the tied token embedding in fp16 while keeping the main projection weights quantized. This fixes the bad prior artifact where embed_tokens.weight was packed and scaled like a normal linear weight.
The bundle includes generation_config.json, chat_template.jinja, tokenizer_config.json, and processor_config.json for Osaurus/vMLX loading.
Loading
Use Osaurus for local Apple Silicon chat and multimodal workflows, or load the bundle in a compatible MLX runtime:
from mlx_lm import load, generate
model, tokenizer = load("JANGQ-AI/gemma-4-12B-it-JANG_4M")
print(generate(model, tokenizer, "Hello", max_tokens=128))
Verification
Local release check for this rebuild:
| Check | Status |
|---|---|
embed_tokens.weight dtype |
fp16 |
embed_tokens.scales / embed_tokens.biases |
absent |
| Quantized attention projections | packed uint32 |
| README front matter | valid Hugging Face YAML first |
| Re-download notice | present after YAML |
- Downloads last month
- 497
Quantized
