zipformer
Collection
zipformer asr & kws models. • 7 items • Updated
This is a small streaming zipformer model developed by Xiaomi AI Lab Next-gen-Kaldi team. The model was trained on around 20,0000 hours of open-sourced Chinese and English datasets. The number of parameters is around 25M (for ctc head), 35M (for transducer head).
The performance on some popular test sets (CER for Chinese, WER for English).
The chunk-size=16 and left-context-frames=128
| Head | aishell test 1 / 2 | wenetspeech test-net/meetting | Common Voice zh | kespeech test | librispeech test-clean / other | gigaspeech test | Common voice en | tedium test |
|---|---|---|---|---|---|---|---|---|
| CTC | 6.7 / 7.24 | 12.92 / 16.45 | 17.18 | 23.32 | 19.4 / 29.66 | 26.18 | 33.52 | 17.67 |
| Transducer | 5.69 / 6.26 | 12.06 / 16.13 | 16.51 | 22.29 | 8.15 / 16.91 | 19.77 | 28.54 | 14.23 |
Please refer to zipformer in github for model details.
Training set list: Librispeech, Gigaspeech, Commonvoice-2022(zh + en), Libriheavy, Emilia (zh+en), AIshell 2, Wenetspeech, Wenetspeech4tts, Kespeech, AIshell, aidatatang, aishell4, alimeeting, magicdata, primewords, stcmds, thchs30.
Please refer to https://pkufool.github.io/zipformer/en/models/
@inproceedings{yao2024zipformer,
title={Zipformer: A faster and better encoder for automatic speech recognition},
author={Yao, Zengwei and Guo, Liyong and Yang, Xiaoyu and Kang, Wei and Kuang, Fangjun and Yang, Yifan and Jin, Zengrui and Lin, Long and Povey, Daniel},
booktitle={International Conference on Learning Representations},
volume={2024},
pages={44440--44455},
year={2024}
}
@inproceedings{yao2025cr,
title={Cr-ctc: Consistency regularization on ctc for improved speech recognition},
author={Yao, Zengwei and Kang, Wei and Yang, Xiaoyu and Kuang, Fangjun and Guo, Liyong and Zhu, Han and Jin, Zengrui and Li, Zhaoqing and Lin, Long and Povey, Daniel},
booktitle={International Conference on Learning Representations},
volume={2025},
pages={26850--26868},
year={2025}
}