Gemma3.5-48B-A4B-Q4_K_M

by XxACCOxX

Gemma3.5-48B-A4B-Q4_K_M is a community-built GGUF release that combines the instruction behavior of Gemma 3 27B with the Gemma 4 26B A4B MoE backbone through full-layer activation-distilled donor experts.

The model keeps the original Gemma 4 expert bank intact, appends a second donor bank derived from Gemma 3 across all 30 language layers, and preserves a Gemma 4-compatible inference path for local deployment.

Model Summary

  • Formal name: Gemma3.5-48B-A4B-Q4_K_M
  • Architecture: gemma4
  • Quantization: Q4_K_M
  • Total parameters: 48.1B
  • Estimated active parameters: ~3.8B
  • Context length: 262,144
  • Total experts: 256
  • Experts used per token: 8
  • Expert layout:
    • slots 0..127: original Gemma 4 experts
    • slots 128..255: Gemma 3 activation-distilled donor experts

Construction

This model uses Gemma 4 26B A4B as the base MoE runtime and appends a full 30-layer donor expert bank derived from Gemma 3 27B.

The donor side was built through activation distillation rather than direct dense replacement. Non-expert backbone tensors remain aligned with the Gemma 4 runtime layout, while the appended donor experts extend the expert bank without overwriting the original Gemma 4 experts.

Previous release from the same author:

MMLU-Pro No-Think Result

On a fixed stratified MMLU-Pro subset with 280 items, 20 items per category across 14 categories, seed = 42, think = false, and num_predict = 2048, the model reaches:

  • Gemma3.5-48B-A4B-Q4_K_M: 63.21% (177 / 280)
  • Gemma 4 26B A4B base: 55.71% (156 / 280)

This is a +7.50 point improvement over the original Gemma 4 26B A4B base under the same no-think configuration.

Intended Use

Gemma3.5-48B-A4B-Q4_K_M is intended for local instruction-following use in direct-answer mode, with strong emphasis on general chat, mathematics, technical prompts, and broad knowledge tasks while retaining a relatively small active path compared with its total parameter count.

Format

This release is provided as a GGUF model for local inference stacks such as llama.cpp and Ollama-based workflows.

License

This release is derived from both Gemma 3 and Gemma 4 upstream checkpoints.

Gemma 4 model pages are published under Apache 2.0, while Gemma 3 model access remains tied to Google's usage license. Because this release combines both upstream lines, it should not be represented as a pure Apache-2.0 model artifact.

Use and redistribution should follow the applicable upstream terms for both source model families.

Downloads last month
101
GGUF
Model size
48B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for XxACCOxX/gemma3.5-48b_a4b

Quantized
(136)
this model