Instructions to use soniqo/Nemotron-Speech-Streaming-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/Nemotron-Speech-Streaming-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Nemotron Speech Streaming 0.6B β LiteRT
Cache-aware FastConformer + RNN-T for sub-second streaming ASR. 80 ms chunks, in-pod state.
Part of the soniqo.audio speech toolkit β an open, runtime-portable stack for speech AI. This bundle is the LiteRT export, designed to plug into the abstract interfaces in
speech-core(C++ voice-agent orchestration library). Browse all LiteRT bundles in the soniqo LiteRT collection.
Use cases on soniqo.audio
Cache-aware streaming ASR exported to LiteRT for sub-second real-time transcription. Three split graphs β encoder, decoder, joint β wired in a per-session loop that holds the FastConformer KV cache and the RNN-T LSTM state across 80 ms chunks. The host owns the loop and the cache state; LiteRT owns the static tensor programs.
Files
| File | Description |
|---|---|
nemotron-streaming-encoder.tflite |
INT8 cache-aware FastConformer encoder |
nemotron-streaming-decoder.tflite |
FP32 RNN-T prediction network (LSTM) |
nemotron-streaming-joint.tflite |
FP32 joint network |
vocab.json |
SentencePiece BPE vocab |
config.json |
Mel + chunk + cache shape spec |
nemotron-streaming-encoder_recipe.json |
Quantizer recipe |
Streaming contract
audio chunk (80 ms, 16 kHz)
β
βΌ
mel fbank (80 bins) βββΊ encoder + cached K/V βββΊ encoded frame
β
βΌ
decoder (LSTM)
β
βΌ
joint βββΊ BPE token
The C++ worker owns the cache and LSTM state across chunks;
LiteRT owns the static tensor programs. Cache shapes are
published in config.json so the worker can pre-allocate
and reset state per session without inspecting the bundle.
Validation
End-to-end verified against a 12.56 s reference utterance:
- First-partial latency 0.42 s
- p50 chunk compute 79.9 ms on CCX23 CPU (RTF β 1.0Γ per session)
- Transcript matches the upstream PyTorch reference to within
boundary-artifact noise (in fact the LiteRT path's
'The quick brown fox jumps over the lazy dog'came out cleaner than the Python validator on this utterance)
Source
Exported from nvidia/nemotron-speech-streaming-en-0.6b. On macOS the conversion runs as a two-stage pipeline (trace
- LiteRT conversion in separate processes) because NeMo and
litert_torchfight over native thread pools when they share one interpreter.
Ecosystem
- soniqo.audio β use-case explorer (transcription, voice cloning, live ASR, voice agents).
- speech-core β C++ orchestration library for voice agents. Abstract
STTInterface/TTSInterface/VADInterface/EnhancerInterface; LiteRT implementations plug straight into the interfaces. - speech-swift β Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
- speech-android β Android SDK consuming on-device LiteRT bundles.
Other LiteRT models in this collection
ASR / Transcription
- Parakeet TDT 0.6B v3 β LiteRT (INT8)
- Omnilingual ASR CTC 300M β LiteRT
- Omnilingual ASR CTC 300M β LiteRT (INT8)
- Qwen3 ASR 0.6B Encoder β LiteRT (INT8)
VAD / Diarization
TTS / Voice Cloning
License
This bundle inherits the upstream model license (cc-by-4.0). See the
linked base_model repository for the full terms.
- Downloads last month
- 174
Model tree for soniqo/Nemotron-Speech-Streaming-LiteRT
Base model
nvidia/nemotron-speech-streaming-en-0.6b