Text-to-Audio - a gary109 Collection

gary109 's Collections

video segmentation

LLM

Representations

Robot

Vision Transformers

Diffusion Model

ML

RLHF

Image Completion

Others

Auto

Vision-Language

Cost

Semantic Segmentation

Video Generation

Code Generation

ASR

Whisper

AGI

Funny

music

SVC

yolo

生成式AI導論 2024

Text-to-Embedding

Text-to-Audio

updated Dec 8, 2023

Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 52
FoleyGen: Visually-Guided Audio Generation

Paper • 2309.10537 • Published Sep 19, 2023 • 6
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

Paper • 2310.11954 • Published Oct 18, 2023 • 24
UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 16
E3 TTS: Easy End-to-End Diffusion-based Text to Speech

Paper • 2311.00945 • Published Nov 2, 2023 • 11
In-Context Prompt Editing For Conditional Audio Generation

Paper • 2311.00895 • Published Nov 1, 2023 • 8
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Paper • 2312.03491 • Published Dec 6, 2023 • 34