Spaces:

irmtou
/

speechtranslationsynthesis

Running

App Files Files Community

irmtou commited on Jun 17

Commit

64d3e15

•

1 Parent(s): 93946cd

Edited README.md and removed flagging button

Browse files

Files changed (3) hide show

README.md +9 -11
app.py +8 -4
requirements.txt +2 -2

README.md CHANGED Viewed

@@ -1,12 +1,9 @@
----
-title: speechtranslationsynthesis
-app_file: app.py
-sdk: gradio
-sdk_version: 4.36.1
----
-# Speech Translation Synthesis: A Speech-To-Speech Translator
-This project was one of the projects for SDSU's Artificial Intelligence Club for the Spring 2024 semester. It's a Gradio-based demo that performs speech-to-speech translation. It uses the Whisper model for speech-to-text transcription, the `translate` library for translation, and the Coqui TTS model for text-to-speech synthesis.
 ## Features
 - Transcribe speech from an audio file or microphone input
@@ -55,8 +52,8 @@ This project was one of the projects for SDSU's Artificial Intelligence Club for
 - **translate(text, language)**: Translates the transcribed text into the target language.
 - **s2s(audio, language)**: Combines the transcription and translation functions, then synthesizes the translated text into speech using the input speaker's voice.
-### Supported Languages
-- Arabic
 - Portuguese
 - Chinese
 - Czech
@@ -73,10 +70,11 @@ This project was one of the projects for SDSU's Artificial Intelligence Club for
 - Hungarian
 - Hindi
-## License
 This project is licensed under the MIT License. See the LICENSE file for more details.
 ## Acknowledgements
 - [Gradio](https://www.gradio.app/) for providing the easy-to-use interface library.
 - [Whisper](https://github.com/openai/whisper) for the speech-to-text model.
 - [Coqui TTS](https://github.com/coqui-ai/TTS) for the text-to-speech synthesis model.

+# Speech Translation Synthesis
+### A Speech-To-Speech Translator
+SDSU's Artificial Intelligence Club Group Project Spring 2024 semester
+This is a Gradio-based demo that performs speech-to-speech translation. It uses the Whisper model for speech-to-text transcription, the `translate` library for translation, and the Coqui TTS model for text-to-speech synthesis.
 ## Features
 - Transcribe speech from an audio file or microphone input
 - **translate(text, language)**: Translates the transcribed text into the target language.
 - **s2s(audio, language)**: Combines the transcription and translation functions, then synthesizes the translated text into speech using the input speaker's voice.
+### Supported Languages 🗣️
+- Arabic
 - Portuguese
 - Chinese
 - Czech
 - Hungarian
 - Hindi
+## License
 This project is licensed under the MIT License. See the LICENSE file for more details.
 ## Acknowledgements
+- SDSU's Artificial Intelligence Club for giving us the idea.
 - [Gradio](https://www.gradio.app/) for providing the easy-to-use interface library.
 - [Whisper](https://github.com/openai/whisper) for the speech-to-text model.
 - [Coqui TTS](https://github.com/coqui-ai/TTS) for the text-to-speech synthesis model.

app.py CHANGED Viewed

@@ -31,7 +31,6 @@ def translate(text, language):
     translated_text = translator.translate(text)
     return translated_text
 # Initialize TTS model outside the function to avoid reinitialization on each call
 from TTS.api import TTS
@@ -76,7 +75,7 @@ language_dropdown = gr.Dropdown(choices=zip(language_names, language_options),
 translate_button = gr.Button(value="Synthesize and Translate my Voice!")
 transcribed_text = gr.Textbox(label="Transcribed Text")
 output_text = gr.Textbox(label="Translated Text")
-output_speech = gr.Audio(label="Translated Speech", type="filepath")
 # Gradio interface with the transcribe function as the main function
 demo = gr.Interface(
@@ -84,10 +83,11 @@ demo = gr.Interface(
     inputs=[gr.Audio(sources=["upload", "microphone"],
                      type="filepath",
                      format='wav',
                      show_download_button=True,
                      waveform_options=gr.WaveformOptions(
                          waveform_color="#01C6FF",
-                         waveform_progress_color="FF69B4",
                          skip_length=2,
                          show_controls=False,
                      )
@@ -95,7 +95,11 @@ demo = gr.Interface(
             language_dropdown],
     outputs=[transcribed_text, output_text, output_speech],
     theme=gr.themes.Soft(),
-    title="Speech-to-Speech Translation (Demo)"
 )
 demo.launch(debug=True, share=True)

     translated_text = translator.translate(text)
     return translated_text
 # Initialize TTS model outside the function to avoid reinitialization on each call
 from TTS.api import TTS
 translate_button = gr.Button(value="Synthesize and Translate my Voice!")
 transcribed_text = gr.Textbox(label="Transcribed Text")
 output_text = gr.Textbox(label="Translated Text")
+output_speech = gr.Audio(label="Synthesized Audio", type="filepath")
 # Gradio interface with the transcribe function as the main function
 demo = gr.Interface(
     inputs=[gr.Audio(sources=["upload", "microphone"],
                      type="filepath",
                      format='wav',
+                     # value="Original Audio",
                      show_download_button=True,
                      waveform_options=gr.WaveformOptions(
                          waveform_color="#01C6FF",
+                         waveform_progress_color="#FF69B4",
                          skip_length=2,
                          show_controls=False,
                      )
             language_dropdown],
     outputs=[transcribed_text, output_text, output_speech],
     theme=gr.themes.Soft(),
+    title="Speech Translation Synthesis",
+    description="This speech-to-speech translator uses the Whisper model for speech-to-text "
+                "transcription, the translate library for translation, and the Coqui TTS model for text-to-speech "
+                "synthesis.",
+    allow_flagging="never"
 )
 demo.launch(debug=True, share=True)

requirements.txt CHANGED Viewed

@@ -1,6 +1,6 @@
-numpy==1.22.0
 gradio~=4.36.1
 git+https://github.com/openai/whisper.git
 translate~=3.6.1
-TTS~=0.22.0
 ffprobe

+numpy
 gradio~=4.36.1
 git+https://github.com/openai/whisper.git
 translate~=3.6.1
+TTS
 ffprobe