qwen2.5-coder-0.5b-block — cut-point ONNX (choochoo rung 2)

Prefill-only ONNX with an extra cut_hidden output (x1 = the residual stream just before the LAST block's MLP), plus block.bin/block.json — the frozen last-block MLP + both norms + the head, fp16 — so a LoRA adapter can be trained on the last block's MLP in the browser.

ONNX outputs: logits, cut_hidden [batch, seq, 896]
block.json lists fp16 tensors n2w, Wgate, Wup, Wdown, nfw, Wlm (concatenated in block.bin)
dtype: q8 (onnx/model_quantized.onnx)

Downloads last month: 5