zipformer
Collection
zipformer asr & kws models. • 7 items • Updated
This is a large zipformer model developed by Xiaomi AI Lab Next-gen-Kaldi team. The model was trained on around 20,0000 hours of open-sourced Chinese and English datasets. The number of parameters is around 150M.
The performance on some popular test sets (CER for Chinese, WER for English).
| Head | aishell test 1 / 2 | wenetspeech test-net/meetting | Common Voice zh | kespeech test | librispeech test-clean / other | gigaspeech test | Common voice en | tedium test |
|---|---|---|---|---|---|---|---|---|
| CTC | 2.51 / 3.51 | 6.23 / 6.67 | 7.96 | 8.95 | 2.62 / 5.17 | 10.73 | 12.99 | 10.11 |
| Transducer | 2.42 / 3.55 | 6.7 / 7.81 | 7.92 | 8.88 | 2.27 / 4.64 | 10.08 | 11.27 | 9.82 |
Please refer to zipformer in github for model details.
Training set list: Librispeech, Gigaspeech, Commonvoice-2022(zh + en), Libriheavy, Emilia (zh+en), AIshell 2, Wenetspeech, Wenetspeech4tts, Kespeech, AIshell, aidatatang, aishell4, alimeeting, magicdata, primewords, stcmds, thchs30.
Please refer to https://pkufool.github.io/zipformer/en/models/
@inproceedings{yao2024zipformer,
title={Zipformer: A faster and better encoder for automatic speech recognition},
author={Yao, Zengwei and Guo, Liyong and Yang, Xiaoyu and Kang, Wei and Kuang, Fangjun and Yang, Yifan and Jin, Zengrui and Lin, Long and Povey, Daniel},
booktitle={International Conference on Learning Representations},
volume={2024},
pages={44440--44455},
year={2024}
}
@inproceedings{yao2025cr,
title={Cr-ctc: Consistency regularization on ctc for improved speech recognition},
author={Yao, Zengwei and Kang, Wei and Yang, Xiaoyu and Kuang, Fangjun and Guo, Liyong and Zhu, Han and Jin, Zengrui and Li, Zhaoqing and Lin, Long and Povey, Daniel},
booktitle={International Conference on Learning Representations},
volume={2025},
pages={26850--26868},
year={2025}
}