Instructions to use yaycute/qwen2.5-coder-0.5b-hidden with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use yaycute/qwen2.5-coder-0.5b-hidden with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('text-generation', 'yaycute/qwen2.5-coder-0.5b-hidden');
Qwen2.5-Coder-0.5B — hidden-state ONNX export
A prefill-only ONNX export of Qwen/Qwen2.5-Coder-0.5B
that adds a last_hidden_state output — the post-final-norm activation that
feeds lm_head — alongside the usual logits. It exists so a LoRA adapter can
be trained on the output head, in the browser, against cached features (the
"choochoo" tool): logits = base_logits + (α/r)·B·(A·h), where h is the
hidden state and only A/B are trained.
- inputs:
input_ids,attention_mask - outputs:
logits [batch, seq, 151936],last_hidden_state [batch, seq, 896] lm_head(last_hidden_state) == logitsexactly (tied embeddings).- dtype shipped here:
q8—onnx/model_quantized.onnx(~632 MB).
Single forward pass only (no KV cache). Exported with a hand-built 4D causal
mask under use_cache=False, so the ONNX tracer sidesteps the transformers
vmap mask builder. See onnx_hidden.py.
- Downloads last month
- 1