Qwen2.5-Coder-0.5B — hidden-state ONNX export

A prefill-only ONNX export of Qwen/Qwen2.5-Coder-0.5B that adds a last_hidden_state output — the post-final-norm activation that feeds lm_head — alongside the usual logits. It exists so a LoRA adapter can be trained on the output head, in the browser, against cached features (the "choochoo" tool): logits = base_logits + (α/r)·B·(A·h), where h is the hidden state and only A/B are trained.

inputs: input_ids, attention_mask
outputs: logits [batch, seq, 151936], last_hidden_state [batch, seq, 896]
lm_head(last_hidden_state) == logits exactly (tied embeddings).
dtype shipped here: q8 — onnx/model_quantized.onnx (~632 MB).

Single forward pass only (no KV cache). Exported with a hand-built 4D causal mask under use_cache=False, so the ONNX tracer sidesteps the transformers vmap mask builder. See onnx_hidden.py.

Downloads last month: 1

Model tree for yaycute/qwen2.5-coder-0.5b-hidden

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-Coder-0.5B

Adapter

(6)

this model