irmtou's picture
Edited README.md and removed flagging button
64d3e15
|
raw
history blame
2.84 kB

Speech Translation Synthesis

A Speech-To-Speech Translator

SDSU's Artificial Intelligence Club Group Project Spring 2024 semester

This is a Gradio-based demo that performs speech-to-speech translation. It uses the Whisper model for speech-to-text transcription, the translate library for translation, and the Coqui TTS model for text-to-speech synthesis.

Features

  • Transcribe speech from an audio file or microphone input
  • Translate transcribed text into a selected target language
  • Synthesize translated text back into speech using the input speaker's voice

Requirements

  • Python 3.7 or higher
  • Required Python packages listed in requirements.txt

Setup

  1. Clone the repository:

    git clone https://github.com/irmtou/speechtranslationsynthesis.git
    cd speechtranslationsynthesis
    
  2. Create a virtual environment and activate it:

    python3 -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    
  3. Install the required packages:

    pip install -r requirements.txt
    

Usage

  1. Run the application:

    python app.py
    
  2. Access the demo interface:

    • After running the application, Gradio will provide you with a local URL. Open this URL in your web browser to access the demo interface.

Code Explanation

  • app.py: The main application script that sets up the Gradio interface and defines the functions for speech-to-text transcription, translation, and text-to-speech synthesis.

Key Components

  • speech_to_text(audio): Uses the Whisper model to transcribe speech from an audio file.
  • translate(text, language): Translates the transcribed text into the target language.
  • s2s(audio, language): Combines the transcription and translation functions, then synthesizes the translated text into speech using the input speaker's voice.

Supported Languages 🗣️

  • Arabic
  • Portuguese
  • Chinese
  • Czech
  • Dutch
  • English
  • French
  • German
  • Italian
  • Polish
  • Russian
  • Spanish
  • Turkish
  • Korean
  • Hungarian
  • Hindi

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

  • SDSU's Artificial Intelligence Club for giving us the idea.
  • Gradio for providing the easy-to-use interface library.
  • Whisper for the speech-to-text model.
  • Coqui TTS for the text-to-speech synthesis model.
  • translate for the translation functionality.

Contributing

Contributions are welcome! Please open an issue or submit a pull request if you have any improvements or suggestions.

Contact

For questions or support, please contact [elee200@hotmail.com].