8 3 4

Dr Wuz

drwuz

https://drwuz.com/

AI & ML interests

None yet

Recent Activity

new activity about 2 months ago

amphion/maskgct:Falls apart over a few sentences. Can you add chunking?

new activity about 2 months ago

amphion/MaskGCT:Commercial License?

View all activity

Organizations

drwuz's activity

New activity in amphion/maskgct about 2 months ago

Falls apart over a few sentences. Can you add chunking?

#6 opened 2 months ago by

johnblues

New activity in amphion/MaskGCT about 2 months ago

Commercial License?

#5 opened 2 months ago by

hrdixon

liked a Space 3 months ago

Running on Zero

245

😻

MaskGCT TTS Demo

liked a dataset 4 months ago

amphion/Emilia-Dataset

Viewer • Updated Sep 6, 2024 • 52.9M • 39.9k • 178

updated a Space 4 months ago

Running

🦀

README

New activity in amphion/Emilia 6 months ago

没有下载音频的脚本

#1 opened 6 months ago by

ZBW

请问数据集只有音频url，没有对应的转写文本吗？

#2 opened 6 months ago by

xkzhang

authored 9 papers 6 months ago

Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder

Paper • 2311.14957 • Published Nov 25, 2023 • 2

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

Paper • 2304.00830 • Published Apr 3, 2023 • 2

Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion

Paper • 2310.11160 • Published Oct 17, 2023

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

Paper • 2401.12264 • Published Jan 22, 2024

SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion

Paper • 2402.12660 • Published Feb 20, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 34

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Paper • 2406.13340 • Published Jun 19, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

Paper • 2407.01494 • Published Jul 1, 2024 • 13

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3, 2024 • 18

upvoted a paper 6 months ago

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3, 2024 • 18

liked a model 10 months ago

amphion/naturalspeech3_facodec

Updated Mar 13, 2024 • 83

commented a paper about 1 year ago

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 53 •

New activity in amphion/text_to_audio about 1 year ago

License

#1 opened about 1 year ago by

mrfakename