File size: 3,020 Bytes
e51a0d0
 
 
 
 
 
 
 
 
 
 
 
 
64d3e15
 
93946cd
64d3e15
 
e51a0d0
93946cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64d3e15
 
93946cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64d3e15
93946cd
 
 
64d3e15
93946cd
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
title: Speech Translation Synthesis
emoji: 🗣️
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 4.36.1
app_file: app.py
pinned: true
models:
   - coqui/XTTS-v2
---

# Speech Translation Synthesis
### A Speech-To-Speech Translator

SDSU's Artificial Intelligence Club Group Project Spring 2024 semester

This is a Gradio-based demo that performs speech-to-speech translation. It uses the Whisper model for speech-to-text transcription, the `translate` library for translation, and the Coqui TTS model for text-to-speech synthesis. 

## Features
- Transcribe speech from an audio file or microphone input
- Translate transcribed text into a selected target language
- Synthesize translated text back into speech using the input speaker's voice

## Requirements
- Python 3.7 or higher
- Required Python packages listed in `requirements.txt`

## Setup

1. **Clone the repository:**
    ```sh
    git clone https://github.com/irmtou/speechtranslationsynthesis.git
    cd speechtranslationsynthesis
    ```

2. **Create a virtual environment and activate it:**
    ```sh
    python3 -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    ```

3. **Install the required packages:**
    ```sh
    pip install -r requirements.txt
    ```

## Usage

1. **Run the application:**
    ```sh
    python app.py
    ```

2. **Access the demo interface:**
    - After running the application, Gradio will provide you with a local URL. Open this URL in your web browser to access the demo interface.

## Code Explanation

- **app.py**: The main application script that sets up the Gradio interface and defines the functions for speech-to-text transcription, translation, and text-to-speech synthesis.

### Key Components
- **speech_to_text(audio)**: Uses the Whisper model to transcribe speech from an audio file.
- **translate(text, language)**: Translates the transcribed text into the target language.
- **s2s(audio, language)**: Combines the transcription and translation functions, then synthesizes the translated text into speech using the input speaker's voice.

### Supported Languages 🗣️
- Arabic 
- Portuguese
- Chinese
- Czech
- Dutch
- English
- French
- German
- Italian
- Polish
- Russian
- Spanish
- Turkish
- Korean
- Hungarian
- Hindi

## License 
This project is licensed under the MIT License. See the LICENSE file for more details.

## Acknowledgements
- SDSU's Artificial Intelligence Club for giving us the idea.
- [Gradio](https://www.gradio.app/) for providing the easy-to-use interface library.
- [Whisper](https://github.com/openai/whisper) for the speech-to-text model.
- [Coqui TTS](https://github.com/coqui-ai/TTS) for the text-to-speech synthesis model.
- [translate](https://pypi.org/project/translate/) for the translation functionality.

## Contributing
Contributions are welcome! Please open an issue or submit a pull request if you have any improvements or suggestions.

## Contact
For questions or support, please contact [elee200@hotmail.com].