WeSpeaker ResNet34-LM — LiteRT

Speaker embedding for speaker identification and diarization clustering.

Part of the soniqo.audio speech toolkit — an open, runtime-portable stack for speech AI. This bundle is the LiteRT export, designed to plug into the abstract interfaces in speech-core (C++ voice-agent orchestration library). Browse all LiteRT bundles in the soniqo LiteRT collection.

Use cases on soniqo.audio

Meeting transcription

256-dim speaker embedding network for Android, ported from pyannote/wespeaker-voxceleb-resnet34-LM.

Model

Property	Value
Architecture	ResNet34 + stats pooling + linear projection
Parameters	~6.6 M
Format	LiteRT (TFLite)
Quantization	float32
Sample rate	16 000 Hz
Input	80-bin kaldi-style mel fbank features (T frames)
Output	L2-normalized 256-dim embedding

Files

File	Size	Description
`wespeaker-resnet34.tflite`	25.4 MB	Full model, FP32
`config.json`	1 KB	Fbank spec + I/O signature

Why fbank-as-input

pyannote's kaldi fbank implementation uses torch.hamming_window and aten._fft_r2c, neither of which has a lowering in litert-torch. We export only the ResNet34 portion; the caller computes the 80-bin fbank features on-device. This matches the standard mobile speaker-embedding pattern and keeps the tflite graph free of FFT ops.

Fbank parameters

Parameter	Value
`num_mel_bins`	80
`frame_length`	25 ms
`frame_shift`	10 ms
`window_type`	hamming
`dither`	0.0
`use_energy`	false

The reference implementation is torchaudio.compliance.kaldi.fbank with those arguments. The model internally applies features - mean(features, dim=1) centering so the caller may pass raw (uncentered) fbank output.

Signature

Inputs:
  fbank         [1, T, 80]   float32   Kaldi mel fbank, T=298 for 3 s @ 16 kHz

Outputs:
  embedding     [1, 256]     float32   L2-normalized speaker embedding

Parity

Verified max diff = 4.2e-07 vs the upstream pyannote model's full forward on a random 3-second waveform (with kaldi fbank features computed externally).

Usage

// Compute 80-bin kaldi fbank features on-device with your preferred library
val fbank = kaldiFbank(audio, melBins = 80, frameLengthMs = 25, frameShiftMs = 10)

val model = Interpreter(loadModelFile("wespeaker-resnet34.tflite"))
val embedding = FloatArray(256)
model.run(fbank, embedding)

Source

Upstream: pyannote/wespeaker-voxceleb-resnet34-LM (CC BY 4.0, gated — accept the license on the upstream page).

Ecosystem

soniqo.audio — use-case explorer (transcription, voice cloning, live ASR, voice agents).
speech-core — C++ orchestration library for voice agents. Abstract STTInterface / TTSInterface / VADInterface / EnhancerInterface; LiteRT implementations plug straight into the interfaces.
speech-swift — Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
speech-android — Android SDK consuming on-device LiteRT bundles.

Other LiteRT models in this collection

ASR / Transcription

VAD / Diarization

TTS / Voice Cloning

VoxCPM2 — LiteRT (INT8)

License

This bundle inherits the upstream model license (cc-by-4.0). See the linked base_model repository for the full terms.

Downloads last month: 38

Model tree for soniqo/WeSpeaker-ResNet34-LM-LiteRT

Base model

pyannote/wespeaker-voxceleb-resnet34-LM

Finetuned

(7)

this model

Collection including soniqo/WeSpeaker-ResNet34-LM-LiteRT

LiteRT

Collection

LiteRT (.tflite) bundles for soniqo.audio. ASR, VAD, diarization, speaker ID, streaming, TTS — served by speech-cloud and speech-core. • 9 items • Updated 4 days ago

soniqo
/

WeSpeaker-ResNet34-LM-LiteRT