irmtou commited on
Commit
93946cd
1 Parent(s): cdd8f93

Added README.md

Browse files
Files changed (2) hide show
  1. README.md +85 -1
  2. requirements.txt +0 -1
README.md CHANGED
@@ -4,4 +4,88 @@ app_file: app.py
4
  sdk: gradio
5
  sdk_version: 4.36.1
6
  ---
7
- # SpeechTranslationSynthesis
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  sdk: gradio
5
  sdk_version: 4.36.1
6
  ---
7
+ # Speech Translation Synthesis: A Speech-To-Speech Translator
8
+
9
+ This project was one of the projects for SDSU's Artificial Intelligence Club for the Spring 2024 semester. It's a Gradio-based demo that performs speech-to-speech translation. It uses the Whisper model for speech-to-text transcription, the `translate` library for translation, and the Coqui TTS model for text-to-speech synthesis.
10
+
11
+ ## Features
12
+ - Transcribe speech from an audio file or microphone input
13
+ - Translate transcribed text into a selected target language
14
+ - Synthesize translated text back into speech using the input speaker's voice
15
+
16
+ ## Requirements
17
+ - Python 3.7 or higher
18
+ - Required Python packages listed in `requirements.txt`
19
+
20
+ ## Setup
21
+
22
+ 1. **Clone the repository:**
23
+ ```sh
24
+ git clone https://github.com/irmtou/speechtranslationsynthesis.git
25
+ cd speechtranslationsynthesis
26
+ ```
27
+
28
+ 2. **Create a virtual environment and activate it:**
29
+ ```sh
30
+ python3 -m venv venv
31
+ source venv/bin/activate # On Windows use `venv\Scripts\activate`
32
+ ```
33
+
34
+ 3. **Install the required packages:**
35
+ ```sh
36
+ pip install -r requirements.txt
37
+ ```
38
+
39
+ ## Usage
40
+
41
+ 1. **Run the application:**
42
+ ```sh
43
+ python app.py
44
+ ```
45
+
46
+ 2. **Access the demo interface:**
47
+ - After running the application, Gradio will provide you with a local URL. Open this URL in your web browser to access the demo interface.
48
+
49
+ ## Code Explanation
50
+
51
+ - **app.py**: The main application script that sets up the Gradio interface and defines the functions for speech-to-text transcription, translation, and text-to-speech synthesis.
52
+
53
+ ### Key Components
54
+ - **speech_to_text(audio)**: Uses the Whisper model to transcribe speech from an audio file.
55
+ - **translate(text, language)**: Translates the transcribed text into the target language.
56
+ - **s2s(audio, language)**: Combines the transcription and translation functions, then synthesizes the translated text into speech using the input speaker's voice.
57
+
58
+ ### Supported Languages
59
+ - Arabic
60
+ - Portuguese
61
+ - Chinese
62
+ - Czech
63
+ - Dutch
64
+ - English
65
+ - French
66
+ - German
67
+ - Italian
68
+ - Polish
69
+ - Russian
70
+ - Spanish
71
+ - Turkish
72
+ - Korean
73
+ - Hungarian
74
+ - Hindi
75
+
76
+ ## License
77
+ This project is licensed under the MIT License. See the LICENSE file for more details.
78
+
79
+ ## Acknowledgements
80
+ - [Gradio](https://www.gradio.app/) for providing the easy-to-use interface library.
81
+ - [Whisper](https://github.com/openai/whisper) for the speech-to-text model.
82
+ - [Coqui TTS](https://github.com/coqui-ai/TTS) for the text-to-speech synthesis model.
83
+ - [translate](https://pypi.org/project/translate/) for the translation functionality.
84
+
85
+ ## Contributing
86
+ Contributions are welcome! Please open an issue or submit a pull request if you have any improvements or suggestions.
87
+
88
+ ## Contact
89
+ For questions or support, please contact [elee200@hotmail.com].
90
+
91
+
requirements.txt CHANGED
@@ -3,5 +3,4 @@ gradio~=4.36.1
3
  git+https://github.com/openai/whisper.git
4
  translate~=3.6.1
5
  TTS~=0.22.0
6
- ffmpeg
7
  ffprobe
 
3
  git+https://github.com/openai/whisper.git
4
  translate~=3.6.1
5
  TTS~=0.22.0
 
6
  ffprobe