Spaces:

amphion
/

naturalspeech3_facodec

Running on Zero

App Files Files Community

Hecheng0625 commited on Mar 12

Commit

9730071

•

1 Parent(s): 7634b6c

Update app.py

Browse files

Files changed (1) hide show

app.py +48 -21

app.py CHANGED Viewed

@@ -135,27 +135,54 @@ demo_outputs = [
     gr.Audio(label="Voice conversion result"),
 ]
-demo = gr.Interface(
-    fn=codec_voice_conversion,
-    inputs=demo_inputs,
-    outputs=demo_outputs,
-    title="NaturalSpeech3 FACodec",
-    description="""
-    ## FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
-    [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/pdf/2403.03100.pdf)
-    [![demo](https://img.shields.io/badge/FACodec-Demo-red)](https://speechresearch.github.io/naturalspeech3/)
-    [![model](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Models-pink)](https://huggingface.co/amphion/naturalspeech3_facodec)
-    ## Overview
-    FACodec is a core component of the advanced text-to-speech (TTS) model NaturalSpeech 3. FACodec converts complex speech waveform into disentangled subspaces representing speech attributes of content, prosody, timbre, and acoustic details and reconstruct high-quality speech waveform from these attributes. FACodec decomposes complex speech into subspaces representing different attributes, thus simplifying the modeling of speech representation.
-    Research can use FACodec to develop different modes of TTS models, such as non-autoregressive based discrete diffusion (NaturalSpeech 3) or autoregressive models (like VALL-E).
-    """,
-)
 if __name__ == "__main__":
     demo.launch()

     gr.Audio(label="Voice conversion result"),
 ]
+with gr.Blocks() as demo:
+    gr.Interface(
+        fn=codec_voice_conversion,
+        inputs=demo_inputs,
+        outputs=demo_outputs,
+        title="NaturalSpeech3 FACodec",
+        description="""
+        ## FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
+        [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/pdf/2403.03100.pdf)
+        [![demo](https://img.shields.io/badge/FACodec-Demo-red)](https://speechresearch.github.io/naturalspeech3/)
+        [![model](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Models-pink)](https://huggingface.co/amphion/naturalspeech3_facodec)
+        ## Overview
+        FACodec is a core component of the advanced text-to-speech (TTS) model NaturalSpeech 3. FACodec converts complex speech waveform into disentangled subspaces representing speech attributes of content, prosody, timbre, and acoustic details and reconstruct high-quality speech waveform from these attributes. FACodec decomposes complex speech into subspaces representing different attributes, thus simplifying the modeling of speech representation.
+        Research can use FACodec to develop different modes of TTS models, such as non-autoregressive based discrete diffusion (NaturalSpeech 3) or autoregressive models (like VALL-E).
+        """,
+    )
+    gr.Examples(
+        examples=[
+            [
+                "default/ref/1.wav",
+                "default/source/1.wav",
+            ],
+            [
+                "default/ref/2.wav",
+                "default/source/2.wav",
+            ],
+            [
+                "default/ref/3.wav",
+                "default/source/3.wav",
+            ],
+            [
+                "default/ref/4.wav",
+                "default/source/4.wav",
+            ],
+            [
+                "default/ref/5.wav",
+                "default/source/5.wav",
+            ],
+        ],
+        inputs=demo_inputs,
+    )
 if __name__ == "__main__":
     demo.launch()