X-ASR zh-en · OpenASR

Bilingual Chinese + English streaming speech recognition — a compact icefall Zipformer2 transducer

Native speech-to-text in the OpenASR runtime — engineered for peak performance on CPU & GPU, no Python at inference time.

✨ Highlights

🇨🇳🇬🇧 Chinese + English — one bilingual checkpoint for zh/en speech, including code-switched audio
⚡ Streaming-first, offline-capable — a cache-aware streaming Zipformer2 transducer for low-latency captions that also runs full-file offline transcription
🪶 Compact ~0.16B — a 6-stack Zipformer2 encoder + stateless RNN-T decoder + tanh joiner over a 5000-token BPE vocab, light enough for on-device CPU
🦀 Native in OpenASR — .oasr packs run with no Python at inference, engineered for peak performance on CPU & GPU

🚀 Quickstart

# 1. Install the OpenASR CLI  ·  https://openasr.org
# 2. Pull a build (pick a quant — see the table below)
openasr pull xasr-zh-en:q8

# 3. Transcribe
openasr transcribe audio.wav --model xasr-zh-en

All builds for this model:

openasr pull xasr-zh-en:fp16
openasr pull xasr-zh-en:q8
openasr pull xasr-zh-en:q4

📦 Available builds

Quant	File (`.oasr`)	Size	RAM peak	RTF · M1 CPU	RTF · M1 GPU	JFK ΔWER vs fp16
fp16	`xasr-zh-en-fp16.oasr`	315 MB	1.36 GB	0.09×	0.09×	0.0%
q8_0	`xasr-zh-en-q8_0.oasr`	176 MB	1.36 GB	0.09×	0.09×	0.0%
q4_k	`xasr-zh-en-q4_k.oasr`	112 MB	1.36 GB	0.09×	0.10×	0.0%

_{RTF = real-time factor on the fixed 11s JFK clip (lower is faster); RAM peak measured per pack
in an isolated subprocess. JFK ΔWER compares each quantized build's JFK transcript to this model's
fp16 JFK transcript, so it measures quantization drift rather than absolute recognition accuracy.
q8_0 is the recommended default — near-reference quality at a fraction of the
footprint.}

🧠 About X-ASR zh-en

X-ASR-zh-en is a compact bilingual (Chinese + English) streaming speech-recognition model from GilgameshWind, built with the icefall / k2 recipe as a cache-aware Zipformer2 RNN-T transducer (a 6-stack, 19-layer Zipformer2 encoder, a stateless RNN-T decoder, and a tanh joiner over a 5000-token BPE vocabulary, ~0.16B parameters). The same checkpoint serves both low-latency streaming captions and full-file offline transcription, making it a good fit for on-device Chinese/English dictation and real-time subtitles. This OpenASR repo repackages the weights as .oasr packs that run natively in the OpenASR runtime — no Python at inference time, all decoding local. The q8_0 build is the recommended default (it matched the fp16 transcript bit-for-bit in OpenASR's verification); q4_k is the smallest build for tight-memory devices and fp16 is for maximum fidelity or verification.

⚙️ How these packs were made

Converted from GilgameshWind/X-ASR-zh-en with the OpenASR importer:

openasr model-pack import-xasr-zipformer-local <src> <out>.oasr \
  --package-id xasr-zh-en --quantization {fp16,q8-0,q4-k}

The .oasr container is GGUF-backed; packs use zero-copy mmap weight binding and graph buffer reuse to keep peak memory low.

⚖️ License

These packs inherit the upstream model's license: Apache-2.0 (source). OpenASR packaging retains the upstream copyright and NOTICE; the only modifications are format conversion and quantization.

🙏 Acknowledgements

This pack is a redistribution of X-ASR-zh-en, created and open-sourced by GilgameshWind (GilgameshWind/X-ASR-zh-en). All credit for the original architecture, training, and weights belongs to the author; the license is inherited from and identical to the upstream model (Apache-2.0). The model is built on the icefall / k2 / Next-gen Kaldi Zipformer2 transducer recipe — thank you to the icefall team and to GilgameshWind for releasing their work openly. OpenASR only performs format conversion, quantization, runtime verification, and local-inference adaptation.

🔗 Links

🦀 OpenASR — https://github.com/QuintinShaw/OpenASR
🌐 Website — https://openasr.org
🤗 Upstream model — GilgameshWind/X-ASR-zh-en

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for OpenASR/xasr-zh-en

Base model

GilgameshWind/X-ASR-zh-en

Finetuned

(1)

this model