Qwen2.5-Coder-0.5B — hidden-state ONNX export

A prefill-only ONNX export of Qwen/Qwen2.5-Coder-0.5B that adds a last_hidden_state output — the post-final-norm activation that feeds lm_head — alongside the usual logits. It exists so a LoRA adapter can be trained on the output head, in the browser, against cached features (the "choochoo" tool): logits = base_logits + (α/r)·B·(A·h), where h is the hidden state and only A/B are trained.

  • inputs: input_ids, attention_mask
  • outputs: logits [batch, seq, 151936], last_hidden_state [batch, seq, 896]
  • lm_head(last_hidden_state) == logits exactly (tied embeddings).
  • dtype shipped here: q8 — onnx/model_quantized.onnx (~632 MB).

Single forward pass only (no KV cache). Exported with a hand-built 4D causal mask under use_cache=False, so the ONNX tracer sidesteps the transformers vmap mask builder. See onnx_hidden.py.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yaycute/qwen2.5-coder-0.5b-hidden

Adapter
(6)
this model