Audio-align 1.5B Branch Checkpoint

This repository stores the non-main-LLM branches for the PreAlignSLM 1.5B checkpoint phase2_3_chat_step5000. The Qwen2.5-1.5B-Instruct backbone was frozen throughout training and is intentionally not included.

Files

File	Contents
`semantic_branch.safetensors`	SenseVoice semantic branch, CTC layer, TASU projector, semantic projector
`acoustic_branch.safetensors`	ArCap acoustic branch and acoustic recognition projector
`router.safetensors`	BERT-small instruction router
`manifest.json`	Export metadata, tensor counts, SHA256 checksums
`config/`	Training and base config snapshots

Total exported parameters: 1,795,187,472.

Loading

Instantiate the matching PreAlignSLM architecture from this codebase, load the frozen Qwen2.5-1.5B-Instruct backbone separately, then merge the branch files:

from safetensors.torch import load_file

state = {}
for filename in [
    "semantic_branch.safetensors",
    "acoustic_branch.safetensors",
    "router.safetensors",
]:
    state.update(load_file(filename, device="cpu"))

missing, unexpected = model.load_state_dict(state, strict=False)
assert not unexpected

module.llm.* keys are expected to be missing from these files because the main LLM is loaded from its original Qwen checkpoint.

Reported Metrics

Best open-ended 1.5B checkpoint (phase2_3_chat/model-5000.pt):

Metric	Result
ASR test-clean / test-other / combined WER	3.50% / 7.22% / 5.36%
AIR-Bench Chat overall, Qwen3.5-27B judge	2.365 / 10
VoiceBench direct alpacaeval / commoneval / wildvoice	1.62 / 1.61 / 1.33
VoiceBench direct sd-qa	21.8%
VoiceBench frozen-LLM agent alpacaeval / commoneval / wildvoice	2.414 / 2.346 / 1.978
VoiceBench frozen-LLM agent sd-qa	28.9%

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support