28 1 18

Nithin Rao Koluguri

nithinraok

nithinraok

AI & ML interests

None yet

Recent Activity

authored a paper 4 days ago

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

authored a paper 4 days ago

TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

authored a paper 4 days ago

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

View all activity

Organizations

nithinraok's activity

authored 10 papers 4 days ago

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

Paper • 2310.12378 • Published Oct 18, 2023

TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

Paper • 2110.04410 • Published Oct 8, 2021

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

Paper • 2309.05248 • Published Sep 11, 2023

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Paper • 2407.03495 • Published Jul 3, 2024

Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

Paper • 2406.05298 • Published Jun 7, 2024

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

Paper • 2409.06656 • Published Sep 10, 2024

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

Paper • 2408.13106 • Published Aug 23, 2024 • 1

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

Paper • 2310.12371 • Published Oct 18, 2023

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

Paper • 2406.19674 • Published Jun 28, 2024

Training and Inference Efficiency of Encoder-Decoder Speech Models

Paper • 2503.05931 • Published 17 days ago • 2

updated 2 collections 4 days ago

Parakeet

Collection

NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 9 items • Updated 4 days ago • 21

Canary

Collection

A collection of multilingual and multitask speech to text models from NVIDIA NeMo 🐤 • 4 items • Updated 4 days ago • 20

published a Space 4 days ago

Canary 1B Flash

🐤

Canary 1B Flash demo

published 2 models 12 days ago

nvidia/canary-1b-flash

Automatic Speech Recognition • Updated 6 days ago • 5.26k • 122

nvidia/canary-180m-flash

Automatic Speech Recognition • Updated 6 days ago • 2.8k • 43

updated a dataset about 1 month ago

mozilla-foundation/common_voice_3_0

Updated Jul 29, 2023 • 57

New activity in mozilla-foundation/common_voice_3_0 about 1 month ago

Update common_voice_3_0.py

#5 opened about 1 month ago by

nithinraok

updated a model 4 months ago

nvidia/mel-codec-22khz

Updated Dec 7, 2024 • 189 • 2

updated 2 Spaces 4 months ago

NeMo Offline Speaker Diarization

👀

Titanet Speaker Verification

📊

Compare two audio samples to identify same speakers