The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System Paper β’ 2310.12378 β’ Published Oct 18, 2023
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context Paper β’ 2110.04410 β’ Published Oct 8, 2021
Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach Paper β’ 2309.05248 β’ Published Sep 11, 2023
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations Paper β’ 2407.03495 β’ Published Jul 3, 2024
Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis Paper β’ 2406.05298 β’ Published Jun 7, 2024
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens Paper β’ 2409.06656 β’ Published Sep 10, 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks Paper β’ 2408.13106 β’ Published Aug 23, 2024 β’ 1
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation Paper β’ 2310.12371 β’ Published Oct 18, 2023
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data Paper β’ 2406.19674 β’ Published Jun 28, 2024
Training and Inference Efficiency of Encoder-Decoder Speech Models Paper β’ 2503.05931 β’ Published 17 days ago β’ 2
Parakeet Collection NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. β’ 9 items β’ Updated 4 days ago β’ 21
Canary Collection A collection of multilingual and multitask speech to text models from NVIDIA NeMo π€ β’ 4 items β’ Updated 4 days ago β’ 20