Instructions to use soniqo/WeSpeaker-ResNet34-LM-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/WeSpeaker-ResNet34-LM-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
WeSpeaker ResNet34-LM β LiteRT
Speaker embedding for speaker identification and diarization clustering.
Part of the soniqo.audio speech toolkit β an open, runtime-portable stack for speech AI. This bundle is the LiteRT export, designed to plug into the abstract interfaces in
speech-core(C++ voice-agent orchestration library). Browse all LiteRT bundles in the soniqo LiteRT collection.
Use cases on soniqo.audio
256-dim speaker embedding network for Android, ported from
pyannote/wespeaker-voxceleb-resnet34-LM.
Model
| Property | Value |
|---|---|
| Architecture | ResNet34 + stats pooling + linear projection |
| Parameters | ~6.6 M |
| Format | LiteRT (TFLite) |
| Quantization | float32 |
| Sample rate | 16 000 Hz |
| Input | 80-bin kaldi-style mel fbank features (T frames) |
| Output | L2-normalized 256-dim embedding |
Files
| File | Size | Description |
|---|---|---|
wespeaker-resnet34.tflite |
25.4 MB | Full model, FP32 |
config.json |
1 KB | Fbank spec + I/O signature |
Why fbank-as-input
pyannote's kaldi fbank implementation uses torch.hamming_window and
aten._fft_r2c, neither of which has a lowering in litert-torch. We
export only the ResNet34 portion; the caller computes the 80-bin fbank
features on-device. This matches the standard mobile speaker-embedding
pattern and keeps the tflite graph free of FFT ops.
Fbank parameters
| Parameter | Value |
|---|---|
num_mel_bins |
80 |
frame_length |
25 ms |
frame_shift |
10 ms |
window_type |
hamming |
dither |
0.0 |
use_energy |
false |
The reference implementation is torchaudio.compliance.kaldi.fbank with
those arguments. The model internally applies features - mean(features, dim=1)
centering so the caller may pass raw (uncentered) fbank output.
Signature
Inputs:
fbank [1, T, 80] float32 Kaldi mel fbank, T=298 for 3 s @ 16 kHz
Outputs:
embedding [1, 256] float32 L2-normalized speaker embedding
Parity
Verified max diff = 4.2e-07 vs the upstream pyannote model's full forward
on a random 3-second waveform (with kaldi fbank features computed
externally).
Usage
// Compute 80-bin kaldi fbank features on-device with your preferred library
val fbank = kaldiFbank(audio, melBins = 80, frameLengthMs = 25, frameShiftMs = 10)
val model = Interpreter(loadModelFile("wespeaker-resnet34.tflite"))
val embedding = FloatArray(256)
model.run(fbank, embedding)
Source
Upstream: pyannote/wespeaker-voxceleb-resnet34-LM (CC BY 4.0, gated β accept the license on the upstream page).
Links
- speech-android β Android SDK
- soniqo.audio β website
- blog β blog
Ecosystem
- soniqo.audio β use-case explorer (transcription, voice cloning, live ASR, voice agents).
- speech-core β C++ orchestration library for voice agents. Abstract
STTInterface/TTSInterface/VADInterface/EnhancerInterface; LiteRT implementations plug straight into the interfaces. - speech-swift β Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
- speech-android β Android SDK consuming on-device LiteRT bundles.
Other LiteRT models in this collection
ASR / Transcription
- Parakeet TDT 0.6B v3 β LiteRT (INT8)
- Nemotron Speech Streaming 0.6B β LiteRT
- Omnilingual ASR CTC 300M β LiteRT
- Omnilingual ASR CTC 300M β LiteRT (INT8)
- Qwen3 ASR 0.6B Encoder β LiteRT (INT8)
VAD / Diarization
TTS / Voice Cloning
License
This bundle inherits the upstream model license (cc-by-4.0). See the
linked base_model repository for the full terms.
- Downloads last month
- 38
Model tree for soniqo/WeSpeaker-ResNet34-LM-LiteRT
Base model
pyannote/wespeaker-voxceleb-resnet34-LM