Gemma 4 E4B IT q4f16_1 MLC
Custom text-first MLC/WebLLM packaging of google/gemma-4-E4B-it in q4f16_1 for browser-local WebGPU inference.
Status
This is a build candidate, not an official mlc-ai release. The compile path has completed successfully; browser runtime validation still needs a WebGPU device that exposes shader-f16.
Build Summary
- Base model:
google/gemma-4-E4B-it - Quantization:
q4f16_1 - Runtime target:
webgpu - Model type:
gemma4 - Conversation template:
gemma4_instruction - Context window:
4096 - Prefill chunk size:
512 - Sliding window:
512 - Quantized parameters:
7,996,157,418 - Parameter size after quantization:
3.976 GB - Build VM:
gemma4-qwik-e4b-builder(m3-megamem-128,europe-west1-b) - MLC-LLM commit:
22fe4b7e2e68ff00c12c2069de2060bce3cfe62d - TVM commit:
e96bc0525fb6d59229d40c5a6eb03cde04bb5ed4
WebLLM Usage
import { CreateMLCEngine } from "@mlc-ai/web-llm";
const repo = "https://huggingface.co/welcoma/gemma-4-E4B-it-q4f16_1-MLC";
const appConfig = {
model_list: [{
model: `${repo}/resolve/main/`,
model_id: "gemma-4-E4B-it-q4f16_1-MLC",
model_lib: `${repo}/resolve/main/libs/gemma-4-E4B-it-q4f16_1-MLC-webgpu.wasm`,
required_features: ["shader-f16"],
}],
};
const engine = await CreateMLCEngine("gemma-4-E4B-it-q4f16_1-MLC", { appConfig });
Files
libs/gemma-4-E4B-it-q4f16_1-MLC-webgpu.wasm: WebGPU model librarymlc-chat-config.json: MLC runtime configurationparams_shard_*.bin: quantized parameter shardstensor-cache.json: tensor metadata cachetokenizer.json,tokenizer_config.json: tokenizer assetsrelease-manifest.json: SHA-256 file inventorybuild-provenance.json: build environment and source commit provenance
Limitations
- Text-only packaging. The base model has multimodal components, but image/audio paths are not packaged or validated here.
- Requires browser WebGPU with
shader-f16. - Runtime validation should be performed on a compatible local browser/GPU before treating this as stable.