MiniCPM5-1B ONNX Web

Transformers.js q4 ONNX export of openbmb/MiniCPM5-1B for browser text generation.

Files

  • onnx/model_q4.onnx: ONNX Runtime 4-bit MatMul quantized decoder with KV cache.
  • config.json: includes transformers.js_config.dtype = "q4" so Transformers.js loads the q4 artifact by default.
  • tokenizer and generation config files copied from the source model export.

Usage

import { pipeline } from "@huggingface/transformers";

const generator = await pipeline("text-generation", "Mike0021/MiniCPM5-1B-ONNX-Web", {
  dtype: "q4",
  device: "webgpu",
});

const output = await generator("Briefly introduce yourself.", {
  max_new_tokens: 64,
  temperature: 0.2,
  do_sample: true,
});
console.log(output[0].generated_text);

If WebGPU is unavailable, use device: "wasm" in the browser.

Downloads last month
76
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mike0021/MiniCPM5-1B-ONNX-Web

Quantized
(16)
this model

Spaces using Mike0021/MiniCPM5-1B-ONNX-Web 2