gemma-4-52B-no-adam-8bit

gemma-4-52B-no-adam-8bit is a native MXFP4 sparse Mixture-of-Experts chat model for long-context assistant and coding workflows. It uses Gemma 4 compatible chat formatting, thinking mode, and OpenAI-compatible tool calling.

This checkpoint was produced without Adam 8-bit. The project already had an optimized LLM build path, and this checkpoint uses MXAR, a project-internal optimization method for the pre/post-training build loop. MXAR implementation details are not included in this release.

Speed

  • Prior optimized Adam 8-bit path: roughly 34 hours for a comparable 52B pass
  • MXAR path for this checkpoint: roughly 2.5 hours
  • Wall-clock speedup: about 13.6x
  • Time saved: about 31.5 hours
  • Wall-clock reduction: about 92.6%

Model Details

  • Architecture: Gemma4 sparse MoE
  • Expert count: 256
  • Active experts per token: 10
  • Weight format: native MXFP4 expert weights with BF16 shared weights
  • Quantization config: quant_method=mxfp4, quant_type=mxfp4, converter_layout=vllm_fused_moe
  • Context: up to 262k positions in config; 100k context is a practical serving target on a single high-memory GPU
  • Recommended runtime: vLLM with Gemma4 reasoning and tool-call parsers

Recommended Serving

vllm serve /path/to/gemma-4-52B-no-adam-8bit \
  --served-model-name gemma-4-52B-no-adam-8bit \
  --host 0.0.0.0 \
  --port 23333 \
  --dtype bfloat16 \
  --max-model-len 100000 \
  --gpu-memory-utilization 0.90 \
  --trust-remote-code \
  --reasoning-parser gemma4 \
  --tool-call-parser gemma4 \
  --enable-auto-tool-choice \
  --chat-template /path/to/gemma-4-52B-no-adam-8bit/chat_template.jinja \
  --default-chat-template-kwargs '{"enable_thinking": true}'

Use standard chat roles: system, user, and assistant. For the intended behavior profile, keep thinking mode enabled and use native tool-call APIs rather than parsing tool calls from raw text.

Generation

  • Start with temperature 0.0 to 0.7.
  • Use the provided chat template.
  • For tool use, pass tools through the OpenAI-compatible chat completions API.

Files

  • config.json
  • generation_config.json
  • tokenizer.json
  • tokenizer_config.json
  • chat_template.jinja
  • model-00001-of-00002.safetensors
  • model-00002-of-00002.safetensors
  • model.safetensors.index.json
Downloads last month
21
Safetensors
Model size
29B params
Tensor type
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support