X-ASR zh-en Β· OpenASR
Bilingual Chinese + English streaming speech recognition β a compact icefall Zipformer2 transducer
Native speech-to-text in the OpenASR runtime β engineered for peak performance on CPU & GPU, no Python at inference time.
β¨ Highlights
- π¨π³π¬π§ Chinese + English β one bilingual checkpoint for zh/en speech, including code-switched audio
- β‘ Streaming-first, offline-capable β a cache-aware streaming Zipformer2 transducer for low-latency captions that also runs full-file offline transcription
- πͺΆ Compact ~0.16B β a 6-stack Zipformer2 encoder + stateless RNN-T decoder + tanh joiner over a 5000-token BPE vocab, light enough for on-device CPU
- π¦ Native in OpenASR β
.oasrpacks run with no Python at inference, engineered for peak performance on CPU & GPU
π Quickstart
# 1. Install the OpenASR CLI Β· https://openasr.org
# 2. Pull a build (pick a quant β see the table below)
openasr pull xasr-zh-en:q8
# 3. Transcribe
openasr transcribe audio.wav --model xasr-zh-en
All builds for this model:
openasr pull xasr-zh-en:fp16
openasr pull xasr-zh-en:q8
openasr pull xasr-zh-en:q4
π¦ Available builds
| Quant | File (.oasr) |
Size | RAM peak | RTF Β· M1 CPU | RTF Β· M1 GPU | JFK ΞWER vs fp16 |
|---|---|---|---|---|---|---|
| fp16 | xasr-zh-en-fp16.oasr |
315 MB | 1.36 GB | 0.09Γ | 0.09Γ | 0.0% |
| q8_0 | xasr-zh-en-q8_0.oasr |
176 MB | 1.36 GB | 0.09Γ | 0.09Γ | 0.0% |
| q4_k | xasr-zh-en-q4_k.oasr |
112 MB | 1.36 GB | 0.09Γ | 0.10Γ | 0.0% |
RTF = real-time factor on the fixed 11s JFK clip (lower is faster); RAM peak measured per pack in an isolated subprocess. JFK ΞWER compares each quantized build's JFK transcript to this model's fp16 JFK transcript, so it measures quantization drift rather than absolute recognition accuracy. q8_0 is the recommended default β near-reference quality at a fraction of the footprint.
π§ About X-ASR zh-en
X-ASR-zh-en is a compact bilingual (Chinese + English) streaming speech-recognition model from
GilgameshWind, built with the icefall / k2 recipe as a cache-aware Zipformer2 RNN-T
transducer (a 6-stack, 19-layer Zipformer2 encoder, a stateless RNN-T decoder, and a tanh joiner
over a 5000-token BPE vocabulary, ~0.16B parameters). The same checkpoint serves both low-latency
streaming captions and full-file offline transcription, making it a good fit for on-device
Chinese/English dictation and real-time subtitles. This OpenASR repo repackages the weights as
.oasr packs that run natively in the OpenASR runtime β no Python at inference time, all decoding
local. The q8_0 build is the recommended default (it matched the fp16 transcript bit-for-bit in
OpenASR's verification); q4_k is the smallest build for tight-memory devices and fp16 is for
maximum fidelity or verification.
βοΈ How these packs were made
Converted from GilgameshWind/X-ASR-zh-en with the OpenASR importer:
openasr model-pack import-xasr-zipformer-local <src> <out>.oasr \
--package-id xasr-zh-en --quantization {fp16,q8-0,q4-k}
The .oasr container is GGUF-backed; packs use zero-copy mmap weight binding and graph
buffer reuse to keep peak memory low.
βοΈ License
These packs inherit the upstream model's license: Apache-2.0 (source). OpenASR packaging retains the upstream copyright and NOTICE; the only modifications are format conversion and quantization.
π Acknowledgements
This pack is a redistribution of X-ASR-zh-en, created and open-sourced by GilgameshWind (GilgameshWind/X-ASR-zh-en). All credit for the original architecture, training, and weights belongs to the author; the license is inherited from and identical to the upstream model (Apache-2.0). The model is built on the icefall / k2 / Next-gen Kaldi Zipformer2 transducer recipe β thank you to the icefall team and to GilgameshWind for releasing their work openly. OpenASR only performs format conversion, quantization, runtime verification, and local-inference adaptation.
π Links
- π¦ OpenASR β https://github.com/QuintinShaw/OpenASR
- π Website β https://openasr.org
- π€ Upstream model β GilgameshWind/X-ASR-zh-en
Model tree for OpenASR/xasr-zh-en
Base model
GilgameshWind/X-ASR-zh-en