Gemma 4 E4B IT q4f16_1 MLC

Custom text-first MLC/WebLLM packaging of google/gemma-4-E4B-it in q4f16_1 for browser-local WebGPU inference.

Status

This is a build candidate, not an official mlc-ai release. The compile path has completed successfully; browser runtime validation still needs a WebGPU device that exposes shader-f16.

Build Summary

  • Base model: google/gemma-4-E4B-it
  • Quantization: q4f16_1
  • Runtime target: webgpu
  • Model type: gemma4
  • Conversation template: gemma4_instruction
  • Context window: 4096
  • Prefill chunk size: 512
  • Sliding window: 512
  • Quantized parameters: 7,996,157,418
  • Parameter size after quantization: 3.976 GB
  • Build VM: gemma4-qwik-e4b-builder (m3-megamem-128, europe-west1-b)
  • MLC-LLM commit: 22fe4b7e2e68ff00c12c2069de2060bce3cfe62d
  • TVM commit: e96bc0525fb6d59229d40c5a6eb03cde04bb5ed4

WebLLM Usage

import { CreateMLCEngine } from "@mlc-ai/web-llm";

const repo = "https://huggingface.co/welcoma/gemma-4-E4B-it-q4f16_1-MLC";
const appConfig = {
  model_list: [{
    model: `${repo}/resolve/main/`,
    model_id: "gemma-4-E4B-it-q4f16_1-MLC",
    model_lib: `${repo}/resolve/main/libs/gemma-4-E4B-it-q4f16_1-MLC-webgpu.wasm`,
    required_features: ["shader-f16"],
  }],
};

const engine = await CreateMLCEngine("gemma-4-E4B-it-q4f16_1-MLC", { appConfig });

Files

  • libs/gemma-4-E4B-it-q4f16_1-MLC-webgpu.wasm: WebGPU model library
  • mlc-chat-config.json: MLC runtime configuration
  • params_shard_*.bin: quantized parameter shards
  • tensor-cache.json: tensor metadata cache
  • tokenizer.json, tokenizer_config.json: tokenizer assets
  • release-manifest.json: SHA-256 file inventory
  • build-provenance.json: build environment and source commit provenance

Limitations

  • Text-only packaging. The base model has multimodal components, but image/audio paths are not packaged or validated here.
  • Requires browser WebGPU with shader-f16.
  • Runtime validation should be performed on a compatible local browser/GPU before treating this as stable.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for welcoma/gemma-4-E4B-it-q4f16_1-MLC

Quantized
(235)
this model