first commit
Browse files
README.md
CHANGED
@@ -10,7 +10,9 @@ tags:
|
|
10 |
**Auffusion** is a latent diffusion model (LDM) for text-to-audio (TTA) generation. **Auffusion** can generate realistic audios including human sounds, animal sounds, natural and artificial sounds and sound effects from textual prompts. We introduce Auffusion, a TTA system adapting T2I model frameworks to TTA task, by effectively leveraging their inherent generative strengths and precise cross-modal alignment. Our objective and subjective evaluations demonstrate that Auffusion surpasses previous TTA approaches using limited data and computational resource. We release our model, inference code, and pre-trained checkpoints for the research community.
|
11 |
|
12 |
π£ We are releasing **Auffusion-Full-no-adapter** which was pre-trained on all datasets described in paper and created for easy use of audio manipulation.
|
|
|
13 |
π£ We are releasing **Auffusion-Full** which was pre-trained on all datasets described in paper.
|
|
|
14 |
π£ We are releasing **Auffusion** which was pre-trained on **AudioCaps**.
|
15 |
|
16 |
## Auffusion Model Family
|
|
|
10 |
**Auffusion** is a latent diffusion model (LDM) for text-to-audio (TTA) generation. **Auffusion** can generate realistic audios including human sounds, animal sounds, natural and artificial sounds and sound effects from textual prompts. We introduce Auffusion, a TTA system adapting T2I model frameworks to TTA task, by effectively leveraging their inherent generative strengths and precise cross-modal alignment. Our objective and subjective evaluations demonstrate that Auffusion surpasses previous TTA approaches using limited data and computational resource. We release our model, inference code, and pre-trained checkpoints for the research community.
|
11 |
|
12 |
π£ We are releasing **Auffusion-Full-no-adapter** which was pre-trained on all datasets described in paper and created for easy use of audio manipulation.
|
13 |
+
|
14 |
π£ We are releasing **Auffusion-Full** which was pre-trained on all datasets described in paper.
|
15 |
+
|
16 |
π£ We are releasing **Auffusion** which was pre-trained on **AudioCaps**.
|
17 |
|
18 |
## Auffusion Model Family
|