Audio-align 1.5B Branch Checkpoint

This repository stores the non-main-LLM branches for the PreAlignSLM 1.5B checkpoint phase2_3_chat_step5000. The Qwen2.5-1.5B-Instruct backbone was frozen throughout training and is intentionally not included.

Files

File Contents
semantic_branch.safetensors SenseVoice semantic branch, CTC layer, TASU projector, semantic projector
acoustic_branch.safetensors ArCap acoustic branch and acoustic recognition projector
router.safetensors BERT-small instruction router
manifest.json Export metadata, tensor counts, SHA256 checksums
config/ Training and base config snapshots

Total exported parameters: 1,795,187,472.

Loading

Instantiate the matching PreAlignSLM architecture from this codebase, load the frozen Qwen2.5-1.5B-Instruct backbone separately, then merge the branch files:

from safetensors.torch import load_file

state = {}
for filename in [
    "semantic_branch.safetensors",
    "acoustic_branch.safetensors",
    "router.safetensors",
]:
    state.update(load_file(filename, device="cpu"))

missing, unexpected = model.load_state_dict(state, strict=False)
assert not unexpected

module.llm.* keys are expected to be missing from these files because the main LLM is loaded from its original Qwen checkpoint.

Reported Metrics

Best open-ended 1.5B checkpoint (phase2_3_chat/model-5000.pt):

Metric Result
ASR test-clean / test-other / combined WER 3.50% / 7.22% / 5.36%
AIR-Bench Chat overall, Qwen3.5-27B judge 2.365 / 10
VoiceBench direct alpacaeval / commoneval / wildvoice 1.62 / 1.61 / 1.33
VoiceBench direct sd-qa 21.8%
VoiceBench frozen-LLM agent alpacaeval / commoneval / wildvoice 2.414 / 2.346 / 1.978
VoiceBench frozen-LLM agent sd-qa 28.9%
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support