Instructions to use yaycute/qwen2.5-coder-0.5b-block with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use yaycute/qwen2.5-coder-0.5b-block with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('text-generation', 'yaycute/qwen2.5-coder-0.5b-block');
qwen2.5-coder-0.5b-block β cut-point ONNX (choochoo rung 2)
Prefill-only ONNX with an extra cut_hidden output (x1 = the residual stream
just before the LAST block's MLP), plus block.bin/block.json β the frozen
last-block MLP + both norms + the head, fp16 β so a LoRA adapter can be trained on
the last block's MLP in the browser.
- ONNX outputs:
logits,cut_hidden [batch, seq, 896] - block.json lists fp16 tensors n2w, Wgate, Wup, Wdown, nfw, Wlm (concatenated in block.bin)
- dtype: q8 (
onnx/model_quantized.onnx)
- Downloads last month
- 5