Spaces:

Shamik3
/

tts

Paused

App Files Files Community

Shamik88 commited on Oct 19, 2024

Commit

29a5392

1 Parent(s): bc2a990

Add large file to gitignore

Browse files

Files changed (4) hide show

.DS_Store +0 -0
.gitignore +3 -0
app.py +35 -22
demo_audio/notes.txt +33 -0

.DS_Store CHANGED Viewed

Binary files a/.DS_Store and b/.DS_Store differ

.gitignore CHANGED Viewed

@@ -4,3 +4,6 @@
 # Python cache files
 __pycache__/
 *.pyc

 # Python cache files
 __pycache__/
 *.pyc
+echo "demo_audio/notebookllm_starhealth_demo.wav" >> .gitignore

app.py CHANGED Viewed

@@ -383,17 +383,15 @@ def update_speed(new_speed):
     speed = new_speed
     return f"Speed set to: {speed}"
-with gr.Blocks() as app_credits:
-    gr.Markdown("""
-# Credits
-* [mrfakename](https://github.com/fakerybakery) for the original [online demo](https://huggingface.co/spaces/mrfakename/E2-F5-TTS)
-* [RootingInLoad](https://github.com/RootingInLoad) for the podcast generation
-* [jpgallegoar](https://github.com/jpgallegoar) for multiple speech-type generation
-""")
 with gr.Blocks(theme='gstaff/sketch') as app_tts:
     gr.Markdown("# Batched TTS")
     ref_audio_input = gr.Audio(label="Reference Audio", type="filepath")
     gen_text_input = gr.Textbox(label="Text to Generate", lines=10)
     model_choice = gr.Radio(
         choices=["F5-TTS", "E2-TTS"], label="Choose TTS Model", value="F5-TTS"
@@ -510,6 +508,9 @@ def parse_emotional_text(gen_text):
     return segments
 with gr.Blocks() as app_emotional:
     # New section for emotional generation
     gr.Markdown(
@@ -520,7 +521,7 @@ with gr.Blocks() as app_emotional:
     **Example Input:**
-    (Regular) Hello, I'd like to order a sandwich please. (Surprised) What do you mean you're out of bread? (Sad) I really wanted a sandwich though... (Angry) You know what, darn you and your little shop, you suck! (Whisper) I'll just go back home and cry now. (Shouting) Why me?!
     """
     )
@@ -531,6 +532,13 @@ with gr.Blocks() as app_emotional:
         regular_name = gr.Textbox(value='Regular', label='Speech Type Name', interactive=False)
         regular_audio = gr.Audio(label='Regular Reference Audio', type='filepath')
         regular_ref_text = gr.Textbox(label='Reference Text (Regular)', lines=2)
     # Additional speech types (up to 99 more)
     max_speech_types = 100
@@ -538,6 +546,7 @@ with gr.Blocks() as app_emotional:
     speech_type_audios = []
     speech_type_ref_texts = []
     speech_type_delete_btns = []
     for i in range(max_speech_types - 1):
         with gr.Row():
@@ -545,10 +554,18 @@ with gr.Blocks() as app_emotional:
             audio_input = gr.Audio(label='Reference Audio', type='filepath', visible=False)
             ref_text_input = gr.Textbox(label='Reference Text', lines=2, visible=False)
             delete_btn = gr.Button("Delete", variant="secondary", visible=False)
         speech_type_names.append(name_input)
         speech_type_audios.append(audio_input)
         speech_type_ref_texts.append(ref_text_input)
         speech_type_delete_btns.append(delete_btn)
     # Button to add speech type
     add_speech_type_btn = gr.Button("Add Speech Type")
@@ -565,17 +582,20 @@ with gr.Blocks() as app_emotional:
             audio_updates = []
             ref_text_updates = []
             delete_btn_updates = []
             for i in range(max_speech_types - 1):
                 if i < speech_type_count:
                     name_updates.append(gr.update(visible=True))
                     audio_updates.append(gr.update(visible=True))
                     ref_text_updates.append(gr.update(visible=True))
                     delete_btn_updates.append(gr.update(visible=True))
                 else:
                     name_updates.append(gr.update())
                     audio_updates.append(gr.update())
                     ref_text_updates.append(gr.update())
                     delete_btn_updates.append(gr.update())
         else:
             # Optionally, show a warning
             # gr.Warning("Maximum number of speech types reached.")
@@ -583,12 +603,13 @@ with gr.Blocks() as app_emotional:
             audio_updates = [gr.update() for _ in range(max_speech_types - 1)]
             ref_text_updates = [gr.update() for _ in range(max_speech_types - 1)]
             delete_btn_updates = [gr.update() for _ in range(max_speech_types - 1)]
-        return [speech_type_count] + name_updates + audio_updates + ref_text_updates + delete_btn_updates
     add_speech_type_btn.click(
         add_speech_type_fn,
         inputs=speech_type_count,
-        outputs=[speech_type_count] + speech_type_names + speech_type_audios + speech_type_ref_texts + speech_type_delete_btns
     )
     # Function to delete a speech type
@@ -749,21 +770,13 @@ with gr.Blocks() as app_emotional:
         inputs=[gen_text_input_emotional, regular_name] + speech_type_names,
         outputs=generate_emotional_btn
     )
 with gr.Blocks() as app:
     gr.Markdown(
         """
-# E2/F5 TTS
-This is a local web UI for F5 TTS with advanced batch processing support. This app supports the following TTS models:
-* [F5-TTS](https://arxiv.org/abs/2410.06885) (A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching)
-* [E2 TTS](https://arxiv.org/abs/2406.18009) (Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS)
-The checkpoints support English and Chinese.
-If you're having issues, try converting your reference audio to WAV or MP3, clipping it to 15s, and shortening your prompt.
-**NOTE: Reference text will be automatically transcribed with Whisper if not provided. For best results, keep your reference clips short (<15s). Ensure the audio is fully uploaded before generating.**
 """
     )
     gr.TabbedInterface([app_tts, app_podcast, app_emotional], ["TTS", "Podcast", "Multi-Style"])

     speed = new_speed
     return f"Speed set to: {speed}"
+def process_audio(ref_audio_path):
+    return ref_audio_path
 with gr.Blocks(theme='gstaff/sketch') as app_tts:
     gr.Markdown("# Batched TTS")
     ref_audio_input = gr.Audio(label="Reference Audio", type="filepath")
+    download_button = gr.File(label="Download Your Recording")
+    ref_audio_input.change(process_audio, inputs=ref_audio_input, outputs=download_button)
     gen_text_input = gr.Textbox(label="Text to Generate", lines=10)
     model_choice = gr.Radio(
         choices=["F5-TTS", "E2-TTS"], label="Choose TTS Model", value="F5-TTS"
     return segments
+def get_audio_file(audio_path):
+    return audio_path
 with gr.Blocks() as app_emotional:
     # New section for emotional generation
     gr.Markdown(
     **Example Input:**
+    (Regular) Hello, I'd like to order a sandwich please. (Surprised) What do you mean you're out of bread? (Sad) I really wanted a sandwich though... (Angry) You know what, fuck you and your little shop, you suck! (Whisper) I'll just go back home and cry now. (Shouting) Why me?!
     """
     )
         regular_name = gr.Textbox(value='Regular', label='Speech Type Name', interactive=False)
         regular_audio = gr.Audio(label='Regular Reference Audio', type='filepath')
         regular_ref_text = gr.Textbox(label='Reference Text (Regular)', lines=2)
+        download_regular_audio = gr.File(label="Download Regular Reference Audio")
+    regular_audio.change(
+        get_audio_file,
+        inputs=regular_audio,
+        outputs=download_regular_audio
+    )
     # Additional speech types (up to 99 more)
     max_speech_types = 100
     speech_type_audios = []
     speech_type_ref_texts = []
     speech_type_delete_btns = []
+    download_speech_type_audios = []
     for i in range(max_speech_types - 1):
         with gr.Row():
             audio_input = gr.Audio(label='Reference Audio', type='filepath', visible=False)
             ref_text_input = gr.Textbox(label='Reference Text', lines=2, visible=False)
             delete_btn = gr.Button("Delete", variant="secondary", visible=False)
+            download_audio_input = gr.File(label="Download Reference Audio", visible=False)
         speech_type_names.append(name_input)
         speech_type_audios.append(audio_input)
         speech_type_ref_texts.append(ref_text_input)
         speech_type_delete_btns.append(delete_btn)
+        download_speech_type_audios.append(download_audio_input)
+        audio_input.change(
+            get_audio_file,
+            inputs=audio_input,
+            outputs=download_audio_input
+        )
     # Button to add speech type
     add_speech_type_btn = gr.Button("Add Speech Type")
             audio_updates = []
             ref_text_updates = []
             delete_btn_updates = []
+            download_btn_updates = []
             for i in range(max_speech_types - 1):
                 if i < speech_type_count:
                     name_updates.append(gr.update(visible=True))
                     audio_updates.append(gr.update(visible=True))
                     ref_text_updates.append(gr.update(visible=True))
                     delete_btn_updates.append(gr.update(visible=True))
+                    download_btn_updates.append(gr.update(visible=True))
                 else:
                     name_updates.append(gr.update())
                     audio_updates.append(gr.update())
                     ref_text_updates.append(gr.update())
                     delete_btn_updates.append(gr.update())
+                    download_btn_updates.append(gr.update())
         else:
             # Optionally, show a warning
             # gr.Warning("Maximum number of speech types reached.")
             audio_updates = [gr.update() for _ in range(max_speech_types - 1)]
             ref_text_updates = [gr.update() for _ in range(max_speech_types - 1)]
             delete_btn_updates = [gr.update() for _ in range(max_speech_types - 1)]
+            download_btn_updates = [gr.update() for _ in range(max_speech_types - 1)]
+        return [speech_type_count] + name_updates + audio_updates + ref_text_updates + delete_btn_updates + download_btn_updates
     add_speech_type_btn.click(
         add_speech_type_fn,
         inputs=speech_type_count,
+        outputs=[speech_type_count] + speech_type_names + speech_type_audios + speech_type_ref_texts + speech_type_delete_btns + download_speech_type_audios
     )
     # Function to delete a speech type
         inputs=[gen_text_input_emotional, regular_name] + speech_type_names,
         outputs=generate_emotional_btn
     )
 with gr.Blocks() as app:
     gr.Markdown(
         """
+# TTS
+This is a local web UI for TTS with advanced batch processing support. This app supports the following TTS models:
 """
     )
     gr.TabbedInterface([app_tts, app_podcast, app_emotional], ["TTS", "Podcast", "Multi-Style"])

demo_audio/notes.txt ADDED Viewed

	@@ -0,0 +1,33 @@

+Bharat is a beautiful country filled with rich culture and history. From the colorful festivals to delicious food, there is so much to enjoy. People from different backgrounds live together, sharing their traditions and stories.
+Good evening. In a major development today, authorities have announced new measures to tackle the rising pollution levels in major cities across the country. The government plans to implement stricter regulations on industrial emissions and promote electric vehicles to improve air quality. Stay tuned for more updates on this developing story.
+/// Podcast script
+Shamik: Welcome back to Tech Talk! I’m your host, Shamik, and today we have a fascinating discussion lined up about the future of AI in our everyday lives. Joining me is my co-host, Ramesh. How are you today, Ramesh?
+Ramesh: I’m doing great, Shamik! Excited to dive into this topic. AI is transforming so many aspects of our lives, from how we work to how we interact with technology.
+Shamik: Absolutely! It’s incredible to see how AI has progressed. A few years ago, the idea of a smart assistant was just starting to gain traction. Now, we have AI integrated into everything, from our phones to our home appliances.
+Ramesh: Right! And it’s not just about convenience. AI is also enhancing productivity in various industries. For example, in healthcare, AI algorithms can analyze medical images faster than human doctors in some cases, helping to catch issues earlier.
+Shamik: That’s a great point! But do you think there are potential downsides to this rapid integration of AI?
+Ramesh: Definitely. While AI has the potential to improve efficiency, there are concerns about privacy and job displacement. We need to strike a balance between innovation and ethical considerations.
+Shamik: I completely agree. It’s crucial for companies to be transparent about how they use AI and to prioritize data privacy. What about the impact on education? AI tools are becoming more common in classrooms.
+Ramesh: That’s true. AI can personalize learning experiences, adapting to each student’s pace. However, there’s also the risk of over-reliance on technology, which could hinder critical thinking skills.
+Shamik: Great insight! As we move forward, it’s vital to keep discussing these challenges and benefits. Before we wrap up, any final thoughts on what the future holds for AI?
+Ramesh: I believe the future is bright! With responsible development and regulation, AI can be a powerful ally in solving some of our biggest challenges.
+Shamik: Well said, Ramesh! Thank you for sharing your thoughts today. And to our listeners, thank you for tuning in. We’ll catch you next time on Tech Talk!
+/// Podcast script