X-ASR zh-en Β· OpenASR

Bilingual Chinese + English streaming speech recognition β€” a compact icefall Zipformer2 transducer

License Format Runtime Base model

Native speech-to-text in the OpenASR runtime β€” engineered for peak performance on CPU & GPU, no Python at inference time.


✨ Highlights

  • πŸ‡¨πŸ‡³πŸ‡¬πŸ‡§ Chinese + English β€” one bilingual checkpoint for zh/en speech, including code-switched audio
  • ⚑ Streaming-first, offline-capable β€” a cache-aware streaming Zipformer2 transducer for low-latency captions that also runs full-file offline transcription
  • πŸͺΆ Compact ~0.16B β€” a 6-stack Zipformer2 encoder + stateless RNN-T decoder + tanh joiner over a 5000-token BPE vocab, light enough for on-device CPU
  • πŸ¦€ Native in OpenASR β€” .oasr packs run with no Python at inference, engineered for peak performance on CPU & GPU

πŸš€ Quickstart

# 1. Install the OpenASR CLI  Β·  https://openasr.org
# 2. Pull a build (pick a quant β€” see the table below)
openasr pull xasr-zh-en:q8

# 3. Transcribe
openasr transcribe audio.wav --model xasr-zh-en

All builds for this model:

openasr pull xasr-zh-en:fp16
openasr pull xasr-zh-en:q8
openasr pull xasr-zh-en:q4

πŸ“¦ Available builds

Quant File (.oasr) Size RAM peak RTF Β· M1 CPU RTF Β· M1 GPU JFK Ξ”WER vs fp16
fp16 xasr-zh-en-fp16.oasr 315 MB 1.36 GB 0.09Γ— 0.09Γ— 0.0%
q8_0 xasr-zh-en-q8_0.oasr 176 MB 1.36 GB 0.09Γ— 0.09Γ— 0.0%
q4_k xasr-zh-en-q4_k.oasr 112 MB 1.36 GB 0.09Γ— 0.10Γ— 0.0%

RTF = real-time factor on the fixed 11s JFK clip (lower is faster); RAM peak measured per pack in an isolated subprocess. JFK Ξ”WER compares each quantized build's JFK transcript to this model's fp16 JFK transcript, so it measures quantization drift rather than absolute recognition accuracy. q8_0 is the recommended default β€” near-reference quality at a fraction of the footprint.

🧠 About X-ASR zh-en

X-ASR-zh-en is a compact bilingual (Chinese + English) streaming speech-recognition model from GilgameshWind, built with the icefall / k2 recipe as a cache-aware Zipformer2 RNN-T transducer (a 6-stack, 19-layer Zipformer2 encoder, a stateless RNN-T decoder, and a tanh joiner over a 5000-token BPE vocabulary, ~0.16B parameters). The same checkpoint serves both low-latency streaming captions and full-file offline transcription, making it a good fit for on-device Chinese/English dictation and real-time subtitles. This OpenASR repo repackages the weights as .oasr packs that run natively in the OpenASR runtime β€” no Python at inference time, all decoding local. The q8_0 build is the recommended default (it matched the fp16 transcript bit-for-bit in OpenASR's verification); q4_k is the smallest build for tight-memory devices and fp16 is for maximum fidelity or verification.

βš™οΈ How these packs were made

Converted from GilgameshWind/X-ASR-zh-en with the OpenASR importer:

openasr model-pack import-xasr-zipformer-local <src> <out>.oasr \
  --package-id xasr-zh-en --quantization {fp16,q8-0,q4-k}

The .oasr container is GGUF-backed; packs use zero-copy mmap weight binding and graph buffer reuse to keep peak memory low.

βš–οΈ License

These packs inherit the upstream model's license: Apache-2.0 (source). OpenASR packaging retains the upstream copyright and NOTICE; the only modifications are format conversion and quantization.

πŸ™ Acknowledgements

This pack is a redistribution of X-ASR-zh-en, created and open-sourced by GilgameshWind (GilgameshWind/X-ASR-zh-en). All credit for the original architecture, training, and weights belongs to the author; the license is inherited from and identical to the upstream model (Apache-2.0). The model is built on the icefall / k2 / Next-gen Kaldi Zipformer2 transducer recipe β€” thank you to the icefall team and to GilgameshWind for releasing their work openly. OpenASR only performs format conversion, quantization, runtime verification, and local-inference adaptation.

πŸ”— Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for OpenASR/xasr-zh-en

Finetuned
(1)
this model