SpeechColab

non-profit

SpeechColab

Activity Feed Request to join this org

AI & ML interests

Machine Learning for Audio/Speech

Recent Activity

yfyeung authored a paper about 21 hours ago

Exploring SSL Discrete Tokens for Multilingual ASR

yfyeung authored a paper about 21 hours ago

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

yfyeung authored a paper about 21 hours ago

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR

View all activity

speechcolab's activity

yfyeung

authored 3 papers about 21 hours ago

yfyeung

authored a paper 3 days ago

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching

Paper • 2502.11128 • Published 27 days ago

yfyeung

in speechcolab/gigaspeech2 5 days ago

can i use this dataset to finetune tts model?

#5 opened about 1 month ago by

adityaakmalazhari

yfyeung

authored 2 papers 2 months ago

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Paper • 2412.15649 • Published Dec 20, 2024

Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers

Paper • 2412.16102 • Published Dec 20, 2024

yfyeung

authored a paper 4 months ago

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning

Paper • 2411.17100 • Published Nov 26, 2024

yfyeung

authored a paper 6 months ago

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

Paper • 2409.00819 • Published Sep 1, 2024

yfyeung

authored 8 papers 7 months ago

Zipformer: A faster and better encoder for automatic speech recognition

Paper • 2310.11230 • Published Oct 17, 2023

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

Paper • 2401.14321 • Published Jan 25, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

Paper • 2402.08846 • Published Feb 13, 2024 • 1

PromptASR for contextualized ASR with controllable style

Paper • 2309.07414 • Published Sep 14, 2023

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Paper • 2309.08105 • Published Sep 15, 2023

GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

Paper • 2406.11546 • Published Jun 17, 2024

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Paper • 2309.07377 • Published Sep 14, 2023

Blank-regularized CTC for Frame Skipping in Neural Transducer

Paper • 2305.11558 • Published May 19, 2023

jimbozhang

authored 2 papers 9 months ago

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Paper • 2106.06909 • Published Jun 13, 2021

speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment

Paper • 2104.01378 • Published Apr 3, 2021

jimbozhang

authored a paper about 1 year ago

CED: Consistent ensemble distillation for audio tagging

Paper • 2308.11957 • Published Aug 23, 2023

AI & ML interests

Recent Activity

Team members 11

speechcolab's activity

can i use this dataset to finetune tts model?