Access Whissle STT Mandarin on Hugging Face

This model is licensed for inference only — no training, fine-tuning, distillation, or reverse engineering permitted. Accept the license to access. Automatic approval.

By clicking "Agree", you accept the Whissle Inference-Only License Agreement. See the LICENSE file for full terms. Key restrictions: INFERENCE ONLY — no training, fine-tuning, distillation, model compression, or reverse engineering permitted. Free for inference use under 100M MAU. "Powered by Whissle" attribution required for redistribution.

Log in or Sign Up to review the conditions and access this model content.

Whissle STT Mandarin

Mandarin Chinese speech recognition model with dual-head tag classifier for speaker demographics and dialect detection. Built on a 1024-dim Conformer-CTC encoder — the largest encoder in the Whissle STT family.

Model Details

Architecture Conformer-CTC (EncDecCTCModel) + dual-head tag classifier
Encoder 1024-dim, Conformer layers
Download size ~600 MB
Format ONNX (CPU and GPU compatible)
Sample rate 16 kHz mono
Language Mandarin Chinese

Tag Classifier Outputs

Category Classes Labels
Age 5 <14, 14-25, 26-40, >41, NONE
Dialect 4 NORTH, SOUTH, OTHERS, NONE
Gender 3 MALE, FEMALE, NONE

Quick Start

git clone https://github.com/WhissleAI/whissle_stt_inference.git
cd whissle_stt_inference
./setup.sh --model zh

License

Whissle Inference-Only License — inference only, no training/fine-tuning/distillation/reverse engineering. Free under 100M MAU.

Downloads last month
76
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WhissleAI/STT-zh-mandarin-ONNX

Quantized
(15)
this model