Instructions to use soniqo/Silero-VAD-v5-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/Silero-VAD-v5-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Silero VAD v5 β LiteRT
Lightweight on-device voice activity detection. 16 kHz, 32 ms chunks, explicit LSTM state I/O.
Part of the soniqo.audio speech toolkit β an open, runtime-portable stack for speech AI. This bundle is the LiteRT export, designed to plug into the abstract interfaces in
speech-core(C++ voice-agent orchestration library). Browse all LiteRT bundles in the soniqo LiteRT collection.
Use cases on soniqo.audio
On-device voice activity detection for Android. 16 kHz, 512-sample chunks with explicit LSTM state I/O so the caller owns state across frames.
Model
| Property | Value |
|---|---|
| Architecture | STFT Conv1d + 4-layer Conv1d encoder + LSTMCell + 1Γ1 classifier |
| Parameters | ~0.4 M |
| Format | LiteRT (TFLite) |
| Quantization | float32 |
| Sample rate | 16 000 Hz |
| Chunk size | 512 samples (32 ms) |
| Context | 64 samples prepended by caller each frame |
Files
| File | Size | Description |
|---|---|---|
silero-vad.tflite |
1.26 MB | Full model, FP32 |
config.json |
1 KB | I/O signature |
Signature
Inputs:
audio [1, 576] float32 64 samples of context + 512 sample chunk
state [2, 1, 128] float32 (h, c) stacked
Outputs:
probability [1, 1] float32 voice probability [0, 1]
state_out [2, 1, 128] float32 next-frame LSTM state
Parity
Re-implemented as a pure nn.Module loading weights directly from the
upstream JIT checkpoint. Verified bit-exact output against the upstream
Silero VAD JIT on random inputs (max diff = 0.0).
Usage
val vad = Interpreter(loadModelFile("silero-vad.tflite"))
var state = FloatArray(2 * 1 * 128) // zeros on first call
var context = FloatArray(64) // zeros on first call
fun classify(chunk512: FloatArray): Float {
val audio = context + chunk512 // 576 samples
val inputs = mapOf(0 to audio.toDirectBuffer(), 1 to state.toDirectBuffer())
val outputs = mapOf(0 to prob, 1 to nextState)
vad.runSignature(inputs, outputs)
context = chunk512.copyOfRange(448, 512) // last 64 samples
state = nextState
return prob[0]
}
Source
Upstream: snakers4/silero-vad (MIT).
Links
- speech-android β Android SDK
- soniqo.audio β website
- blog β blog
Ecosystem
- soniqo.audio β use-case explorer (transcription, voice cloning, live ASR, voice agents).
- speech-core β C++ orchestration library for voice agents. Abstract
STTInterface/TTSInterface/VADInterface/EnhancerInterface; LiteRT implementations plug straight into the interfaces. - speech-swift β Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
- speech-android β Android SDK consuming on-device LiteRT bundles.
Other LiteRT models in this collection
ASR / Transcription
- Parakeet TDT 0.6B v3 β LiteRT (INT8)
- Nemotron Speech Streaming 0.6B β LiteRT
- Omnilingual ASR CTC 300M β LiteRT
- Omnilingual ASR CTC 300M β LiteRT (INT8)
- Qwen3 ASR 0.6B Encoder β LiteRT (INT8)
VAD / Diarization
TTS / Voice Cloning
License
This bundle inherits the upstream model license (mit). See the
linked base_model repository for the full terms.
- Downloads last month
- 35