Audio-align 1.5B Branch Checkpoint
This repository stores the non-main-LLM branches for the PreAlignSLM 1.5B
checkpoint phase2_3_chat_step5000. The Qwen2.5-1.5B-Instruct backbone was frozen
throughout training and is intentionally not included.
Files
| File | Contents |
|---|---|
semantic_branch.safetensors |
SenseVoice semantic branch, CTC layer, TASU projector, semantic projector |
acoustic_branch.safetensors |
ArCap acoustic branch and acoustic recognition projector |
router.safetensors |
BERT-small instruction router |
manifest.json |
Export metadata, tensor counts, SHA256 checksums |
config/ |
Training and base config snapshots |
Total exported parameters: 1,795,187,472.
Loading
Instantiate the matching PreAlignSLM architecture from this codebase, load the frozen Qwen2.5-1.5B-Instruct backbone separately, then merge the branch files:
from safetensors.torch import load_file
state = {}
for filename in [
"semantic_branch.safetensors",
"acoustic_branch.safetensors",
"router.safetensors",
]:
state.update(load_file(filename, device="cpu"))
missing, unexpected = model.load_state_dict(state, strict=False)
assert not unexpected
module.llm.* keys are expected to be missing from these files because the
main LLM is loaded from its original Qwen checkpoint.
Reported Metrics
Best open-ended 1.5B checkpoint (phase2_3_chat/model-5000.pt):
| Metric | Result |
|---|---|
| ASR test-clean / test-other / combined WER | 3.50% / 7.22% / 5.36% |
| AIR-Bench Chat overall, Qwen3.5-27B judge | 2.365 / 10 |
| VoiceBench direct alpacaeval / commoneval / wildvoice | 1.62 / 1.61 / 1.33 |
| VoiceBench direct sd-qa | 21.8% |
| VoiceBench frozen-LLM agent alpacaeval / commoneval / wildvoice | 2.414 / 2.346 / 1.978 |
| VoiceBench frozen-LLM agent sd-qa | 28.9% |
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support