Silero VAD v5 — LiteRT

Lightweight on-device voice activity detection. 16 kHz, 32 ms chunks, explicit LSTM state I/O.

Part of the soniqo.audio speech toolkit — an open, runtime-portable stack for speech AI. This bundle is the LiteRT export, designed to plug into the abstract interfaces in speech-core (C++ voice-agent orchestration library). Browse all LiteRT bundles in the soniqo LiteRT collection.

Use cases on soniqo.audio

On-device voice activity detection for Android. 16 kHz, 512-sample chunks with explicit LSTM state I/O so the caller owns state across frames.

Model

Property	Value
Architecture	STFT Conv1d + 4-layer Conv1d encoder + LSTMCell + 1×1 classifier
Parameters	~0.4 M
Format	LiteRT (TFLite)
Quantization	float32
Sample rate	16 000 Hz
Chunk size	512 samples (32 ms)
Context	64 samples prepended by caller each frame

Files

File	Size	Description
`silero-vad.tflite`	1.26 MB	Full model, FP32
`config.json`	1 KB	I/O signature

Signature

Inputs:
  audio        [1, 576]      float32   64 samples of context + 512 sample chunk
  state        [2, 1, 128]   float32   (h, c) stacked

Outputs:
  probability  [1, 1]        float32   voice probability [0, 1]
  state_out    [2, 1, 128]   float32   next-frame LSTM state

Parity

Re-implemented as a pure nn.Module loading weights directly from the upstream JIT checkpoint. Verified bit-exact output against the upstream Silero VAD JIT on random inputs (max diff = 0.0).

Usage

val vad = Interpreter(loadModelFile("silero-vad.tflite"))

var state = FloatArray(2 * 1 * 128) // zeros on first call
var context = FloatArray(64)        // zeros on first call

fun classify(chunk512: FloatArray): Float {
    val audio = context + chunk512                 // 576 samples
    val inputs = mapOf(0 to audio.toDirectBuffer(), 1 to state.toDirectBuffer())
    val outputs = mapOf(0 to prob, 1 to nextState)
    vad.runSignature(inputs, outputs)
    context = chunk512.copyOfRange(448, 512)       // last 64 samples
    state = nextState
    return prob[0]
}

Source

Upstream: snakers4/silero-vad (MIT).

Ecosystem

soniqo.audio — use-case explorer (transcription, voice cloning, live ASR, voice agents).
speech-core — C++ orchestration library for voice agents. Abstract STTInterface / TTSInterface / VADInterface / EnhancerInterface; LiteRT implementations plug straight into the interfaces.
speech-swift — Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
speech-android — Android SDK consuming on-device LiteRT bundles.

Other LiteRT models in this collection

ASR / Transcription

VAD / Diarization

TTS / Voice Cloning

VoxCPM2 — LiteRT (INT8)

License

This bundle inherits the upstream model license (mit). See the linked base_model repository for the full terms.

Downloads last month: 35

Inference Providers NEW

Voice Activity Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including soniqo/Silero-VAD-v5-LiteRT

LiteRT

Collection

LiteRT (.tflite) bundles for soniqo.audio. ASR, VAD, diarization, speaker ID, streaming, TTS — served by speech-cloud and speech-core. • 9 items • Updated 3 days ago

soniqo
/

Silero-VAD-v5-LiteRT