dtype: "q8" in transformers.js loads model_quantized.onnx (404) — repo has model_q8.onnx

by high-u - opened about 5 hours ago

Hi, and thanks for releasing this model in ONNX form. 🙏

I ran into a small naming mismatch when loading the q8 weights with @huggingface/transformers (v4.2.0) in the browser, and wanted to share it in case it's useful to others trying to run this.

What happens

const generator = await pipeline("text-generation", "LiquidAI/LFM2.5-230M-ONNX", {
  dtype: "q8",
  device: "webgpu",
});

This fails with:

transformers@4.2.0:10 Uncaught (in promise) Error: Could not locate file: "https://huggingface.co/LiquidAI/LFM2.5-230M-ONNX/resolve/main/onnx/model_quantized.onnx".
https://huggingface.co/LiquidAI/LFM2.5-230M-ONNX/resolve/main/onnx/model_quantized.onnx_data 404 (Not Found)

Why

transformers.js maps dtypes to filename suffixes via DEFAULT_DTYPE_SUFFIX_MAPPING, where q8 → "_quantized". It then builds the path as onnx/model{suffix}.onnx, so dtype: "q8" resolves to onnx/model_quantized.onnx. This repo ships the q8 weights as onnx/model_q8.onnx, so the expected name isn't found.

Current workaround (in case anyone needs it now)

const generator = await pipeline("text-generation", "LiquidAI/LFM2.5-230M-ONNX", {
  device: "webgpu",
  dtype: "fp32",              // empty suffix, so "_quantized" isn't appended
  model_file_name: "model_q8" // override the base filename
});

Is the _q8 naming intentional (e.g. to match another runtime)? Either way, just wanted to surface it. Thanks again!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment