Xinfa Zhu

xfzhu

https://orcid.org/0000-0001-9275-523X

zxf-icpc

AI & ML interests

Speech Generation

Recent Activity

authored a paper 2 days ago

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

liked a Space 8 days ago

Mobvoi/Offical-Spark-TTS

liked a model 15 days ago

SparkAudio/Spark-TTS-0.5B

View all activity

Organizations

None yet

xfzhu's activity

authored a paper 2 days ago

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Paper • 2503.01710 • Published 11 days ago • 3

liked a Space 8 days ago

167

Spark TTS

🌖

A text-to-speech model powered by SparkAudio and Mobvoi.

liked a model 15 days ago

SparkAudio/Spark-TTS-0.5B

Text-to-Speech • Updated 7 days ago • 8.91k • 398

liked a dataset 16 days ago

HKUSTAudio/Audio-FLAN-Dataset

Preview • Updated 8 days ago • 6.56k • 25

upvoted a collection 17 days ago

Llasa

Collection

TTS foundation model compatible with Llama framework (160k hours tokenized speech data released) • 11 items • Updated 21 days ago • 15

liked a Space 23 days ago

OSUM

💬

西北工业大学ASLP实验室OSUM项目demo展示

upvoted an article about 1 month ago

Article

From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages

and 1 other •

Feb 11

• 26

upvoted a paper about 1 month ago

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 25

liked a model about 2 months ago

HKUSTAudio/Llasa-1B

Text-to-Speech • Updated 5 days ago • 5.5k • 94

liked a model 2 months ago

HKUSTAudio/Llasa-3B

Text-to-Speech • Updated 5 days ago • 3.48k • 470

authored a paper 2 months ago

Autoregressive Speech Synthesis with Next-Distribution Prediction

Paper • 2412.16846 • Published Dec 22, 2024

upvoted a paper 4 months ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 76

liked a model 10 months ago

meta-llama/Meta-Llama-3-8B-Instruct

Text Generation • Updated Sep 27, 2024 • 1.12M • • 3.87k

liked a dataset 11 months ago

Wenetspeech4TTS/WenetSpeech4TTS

Updated Jul 25, 2024 • 2.74k • 70

liked a Space 12 months ago

2.09k

Whisper

📉

Transcribe audio from microphone, file, or YouTube link