Silero VAD v5 β€” LiteRT

Lightweight on-device voice activity detection. 16 kHz, 32 ms chunks, explicit LSTM state I/O.

Part of the soniqo.audio speech toolkit β€” an open, runtime-portable stack for speech AI. This bundle is the LiteRT export, designed to plug into the abstract interfaces in speech-core (C++ voice-agent orchestration library). Browse all LiteRT bundles in the soniqo LiteRT collection.

Use cases on soniqo.audio

On-device voice activity detection for Android. 16 kHz, 512-sample chunks with explicit LSTM state I/O so the caller owns state across frames.

Model

Property Value
Architecture STFT Conv1d + 4-layer Conv1d encoder + LSTMCell + 1Γ—1 classifier
Parameters ~0.4 M
Format LiteRT (TFLite)
Quantization float32
Sample rate 16 000 Hz
Chunk size 512 samples (32 ms)
Context 64 samples prepended by caller each frame

Files

File Size Description
silero-vad.tflite 1.26 MB Full model, FP32
config.json 1 KB I/O signature

Signature

Inputs:
  audio        [1, 576]      float32   64 samples of context + 512 sample chunk
  state        [2, 1, 128]   float32   (h, c) stacked

Outputs:
  probability  [1, 1]        float32   voice probability [0, 1]
  state_out    [2, 1, 128]   float32   next-frame LSTM state

Parity

Re-implemented as a pure nn.Module loading weights directly from the upstream JIT checkpoint. Verified bit-exact output against the upstream Silero VAD JIT on random inputs (max diff = 0.0).

Usage

val vad = Interpreter(loadModelFile("silero-vad.tflite"))

var state = FloatArray(2 * 1 * 128) // zeros on first call
var context = FloatArray(64)        // zeros on first call

fun classify(chunk512: FloatArray): Float {
    val audio = context + chunk512                 // 576 samples
    val inputs = mapOf(0 to audio.toDirectBuffer(), 1 to state.toDirectBuffer())
    val outputs = mapOf(0 to prob, 1 to nextState)
    vad.runSignature(inputs, outputs)
    context = chunk512.copyOfRange(448, 512)       // last 64 samples
    state = nextState
    return prob[0]
}

Source

Upstream: snakers4/silero-vad (MIT).

Links

Ecosystem

  • soniqo.audio β€” use-case explorer (transcription, voice cloning, live ASR, voice agents).
  • speech-core β€” C++ orchestration library for voice agents. Abstract STTInterface / TTSInterface / VADInterface / EnhancerInterface; LiteRT implementations plug straight into the interfaces.
  • speech-swift β€” Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
  • speech-android β€” Android SDK consuming on-device LiteRT bundles.

Other LiteRT models in this collection

ASR / Transcription

VAD / Diarization

TTS / Voice Cloning

License

This bundle inherits the upstream model license (mit). See the linked base_model repository for the full terms.

Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including soniqo/Silero-VAD-v5-LiteRT