Speech Translation Synthesis

A Speech-To-Speech Translator

SDSU's Artificial Intelligence Club Group Project Spring 2024 semester

This is a Gradio-based demo that performs speech-to-speech translation. It uses the Whisper model for speech-to-text transcription, the translate library for translation, and the Coqui TTS model for text-to-speech synthesis.

Features

Transcribe speech from an audio file or microphone input
Translate transcribed text into a selected target language
Synthesize translated text back into speech using the input speaker's voice

Requirements

Python 3.7 or higher
Required Python packages listed in requirements.txt

Setup

Clone the repository:

git clone https://github.com/irmtou/speechtranslationsynthesis.git
cd speechtranslationsynthesis

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Run the application:
```
python app.py
```
Access the demo interface:
- After running the application, Gradio will provide you with a local URL. Open this URL in your web browser to access the demo interface.

Code Explanation

app.py: The main application script that sets up the Gradio interface and defines the functions for speech-to-text transcription, translation, and text-to-speech synthesis.

Key Components

speech_to_text(audio): Uses the Whisper model to transcribe speech from an audio file.
translate(text, language): Translates the transcribed text into the target language.
s2s(audio, language): Combines the transcription and translation functions, then synthesizes the translated text into speech using the input speaker's voice.

Supported Languages 🗣️

Arabic
Portuguese
Chinese
Czech
Dutch
English
French
German
Italian
Polish
Russian
Spanish
Turkish
Korean
Hungarian
Hindi

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

SDSU's Artificial Intelligence Club for giving us the idea.
Gradio for providing the easy-to-use interface library.
Whisper for the speech-to-text model.
Coqui TTS for the text-to-speech synthesis model.
translate for the translation functionality.

Contributing

Contributions are welcome! Please open an issue or submit a pull request if you have any improvements or suggestions.

Contact

For questions or support, please contact [elee200@hotmail.com].