Xilin Jiang's picture

2 19 1

Xilin Jiang

xi-j

·

xi-j

AI & ML interests

None yet

Recent Activity

authored a paper about 20 hours ago

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

authored a paper about 20 hours ago

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation

authored a paper about 20 hours ago

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue

View all activity

Organizations

None yet

xi-j's activity

authored 6 papers about 20 hours ago

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

Paper • 2407.09732 • Published Jul 13, 2024 • 8

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation

Paper • 2408.11849 • Published Aug 13, 2024

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue

Paper • 2409.04927 • Published Sep 7, 2024

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

Paper • 2409.10058 • Published Sep 16, 2024 • 2

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Paper • 2309.09493 • Published Sep 18, 2023

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Paper • 2502.16794 • Published 3 days ago • 4

commented a paper 1 day ago

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Paper • 2502.16794 • Published 3 days ago • 4 •

upvoted a paper 1 day ago

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Paper • 2502.16794 • Published 3 days ago • 4

commented a paper 1 day ago

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Paper • 2502.16794 • Published 3 days ago • 4 •

upvoted a paper 1 day ago

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published 7 days ago • 51

upvoted 4 papers about 1 month ago

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 63

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 88

Enhancing Human-Like Responses in Large Language Models

Paper • 2501.05032 • Published Jan 9 • 50

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10 • 46

upvoted 3 papers 5 months ago

UniMuMo: Unified Text, Music and Motion Generation

Paper • 2410.04534 • Published Oct 6, 2024 • 19

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

Presto! Distilling Steps and Layers for Accelerating Music Generation

Paper • 2410.05167 • Published Oct 7, 2024 • 17

upvoted 3 papers 6 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 125

Foundation Models for Music: A Survey

Paper • 2408.14340 • Published Aug 26, 2024 • 44

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published Aug 26, 2024 • 62