PlotweaverModel commited on
Commit
bf774ca
Β·
verified Β·
1 Parent(s): 27a4574

Files upload

Browse files
Files changed (4) hide show
  1. DEPLOY.md +94 -0
  2. README.md +22 -9
  3. app.py +346 -0
  4. requirements.txt +6 -0
DEPLOY.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deployment Guide β€” HuggingFace Space
2
+
3
+ ## Quick Deploy (3 steps)
4
+
5
+ ### Step 1: Create a new HuggingFace Space
6
+
7
+ 1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
8
+ 2. Fill in:
9
+ - **Owner**: `PlotweaverAI` (or your account)
10
+ - **Space name**: `live-football-commentary-en-yo`
11
+ - **SDK**: Gradio
12
+ - **Hardware**: **T4 small** (GPU required β€” free tier CPU won't work well)
13
+ - **Visibility**: Public
14
+ 3. Click **Create Space**
15
+
16
+ ### Step 2: Upload the files
17
+
18
+ Upload these 3 files to the Space repo (via the web UI or git):
19
+
20
+ ```
21
+ β”œβ”€β”€ README.md ← Space metadata (hardware, tags, etc.)
22
+ β”œβ”€β”€ app.py ← Main Gradio application
23
+ └── requirements.txt ← Python dependencies
24
+ ```
25
+
26
+ **Option A β€” Web upload:**
27
+ - Go to your Space β†’ Files β†’ "Add file" β†’ Upload each file
28
+
29
+ **Option B β€” Git (recommended):**
30
+ ```bash
31
+ # Clone the space
32
+ git clone https://huggingface.co/spaces/PlotweaverAI/live-football-commentary-en-yo
33
+ cd live-football-commentary-en-yo
34
+
35
+ # Copy the files
36
+ cp /path/to/hf_space/* .
37
+
38
+ # Push
39
+ git add .
40
+ git commit -m "Initial deploy: EN→YO commentary pipeline"
41
+ git push
42
+ ```
43
+
44
+ ### Step 3: Wait for build
45
+
46
+ The Space will automatically:
47
+ 1. Install dependencies from `requirements.txt`
48
+ 2. Download all 3 models from HuggingFace Hub
49
+ 3. Start the Gradio app
50
+
51
+ First build takes ~5-10 minutes (model downloads). Subsequent restarts are faster due to caching.
52
+
53
+ ---
54
+
55
+ ## Hardware Notes
56
+
57
+ | Hardware | Cost | Performance |
58
+ |----------|------|-------------|
59
+ | T4 small | ~$0.60/hr | Good β€” full pipeline in ~6-10s |
60
+ | T4 medium | ~$1.00/hr | Better for concurrent users |
61
+ | A10G small | ~$1.05/hr | Fastest inference |
62
+ | CPU basic | Free | Very slow (~60s+), not recommended |
63
+
64
+ The Space will **sleep after 48 hours of inactivity** on paid hardware.
65
+ You can enable "persistent" mode in Space settings to keep it running.
66
+
67
+ ---
68
+
69
+ ## Troubleshooting
70
+
71
+ **Space keeps crashing / OOM:**
72
+ - T4 small has 16GB VRAM β€” should be enough for all 3 models in float16
73
+ - If issues persist, try T4 medium
74
+
75
+ **Models fail to load:**
76
+ - Make sure all 3 model repos are **public** on HuggingFace
77
+ - If private, add a `HF_TOKEN` secret in Space settings
78
+
79
+ **Audio recording doesn't work:**
80
+ - Browser mic access requires HTTPS (HuggingFace Spaces provides this)
81
+ - Make sure you've granted microphone permission in the browser
82
+
83
+ ---
84
+
85
+ ## Customization
86
+
87
+ **To add more source/target languages** (your MT model supports 6):
88
+ Edit `app.py` and add a language dropdown to the Gradio UI.
89
+ Your NLLB model likely supports these codes:
90
+ - `eng_Latn` (English)
91
+ - `yor_Latn` (Yoruba)
92
+ - `ibo_Latn` (Igbo)
93
+ - `hau_Latn` (Hausa)
94
+ - Check your model card for the full list.
README.md CHANGED
@@ -1,13 +1,26 @@
1
  ---
2
- title: Live Football Commentary
3
- emoji: πŸ“‰
4
- colorFrom: red
5
- colorTo: red
6
  sdk: gradio
7
- sdk_version: 6.10.0
8
  app_file: app.py
9
- pinned: false
10
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: Live Football Commentary - English to Yoruba
3
+ emoji: 🏟️
4
+ colorFrom: green
5
+ colorTo: yellow
6
  sdk: gradio
7
+ sdk_version: "4.44.1"
8
  app_file: app.py
9
+ pinned: true
10
+ license: mit
11
+ hardware: t4-small
12
+ models:
13
+ - PlotweaverAI/whisper-small-de-en
14
+ - PlotweaverAI/nllb-200-distilled-600M-african-6lang
15
+ - PlotweaverAI/yoruba-mms-tts-new
16
+ tags:
17
+ - speech-to-speech
18
+ - translation
19
+ - yoruba
20
+ - football
21
+ - commentary
22
+ - asr
23
+ - tts
24
+ - nllb
25
+ short_description: Translate live English football commentary to Yoruba speech
26
  ---
 
 
app.py ADDED
@@ -0,0 +1,346 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Live Football Commentary Pipeline β€” English β†’ Yoruba
3
+ =====================================================
4
+ Gradio app for HuggingFace Spaces.
5
+
6
+ Pipeline: ASR (Whisper) β†’ MT (NLLB-200) β†’ TTS (MMS-TTS Yoruba)
7
+ """
8
+
9
+ import torch
10
+ import numpy as np
11
+ import re
12
+ import time
13
+ import gradio as gr
14
+ from transformers import (
15
+ pipeline as hf_pipeline,
16
+ AutoTokenizer,
17
+ AutoModelForSeq2SeqLM,
18
+ )
19
+
20
+ # =============================================================================
21
+ # Configuration
22
+ # =============================================================================
23
+
24
+ ASR_MODEL_ID = "PlotweaverAI/whisper-small-de-en"
25
+ MT_MODEL_ID = "PlotweaverAI/nllb-200-distilled-600M-african-6lang"
26
+ TTS_MODEL_ID = "PlotweaverAI/yoruba-mms-tts-new"
27
+
28
+ MT_SRC_LANG = "eng_Latn"
29
+ MT_TGT_LANG = "yor_Latn"
30
+
31
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
32
+ TORCH_DTYPE = torch.float16 if torch.cuda.is_available() else torch.float32
33
+
34
+
35
+ # =============================================================================
36
+ # Load models (runs once at startup)
37
+ # =============================================================================
38
+
39
+ print(f"Device: {DEVICE} | Dtype: {TORCH_DTYPE}")
40
+ print("Loading models...")
41
+
42
+ # ASR
43
+ print(f" Loading ASR: {ASR_MODEL_ID}")
44
+ asr_pipe = hf_pipeline(
45
+ "automatic-speech-recognition",
46
+ model=ASR_MODEL_ID,
47
+ device=DEVICE,
48
+ torch_dtype=TORCH_DTYPE,
49
+ )
50
+ print(" ASR loaded βœ“")
51
+
52
+ # MT
53
+ print(f" Loading MT: {MT_MODEL_ID}")
54
+ mt_tokenizer = AutoTokenizer.from_pretrained(MT_MODEL_ID)
55
+ mt_model = AutoModelForSeq2SeqLM.from_pretrained(
56
+ MT_MODEL_ID,
57
+ torch_dtype=TORCH_DTYPE,
58
+ ).to(DEVICE)
59
+ mt_tokenizer.src_lang = MT_SRC_LANG
60
+ print(" MT loaded βœ“")
61
+
62
+ # TTS
63
+ print(f" Loading TTS: {TTS_MODEL_ID}")
64
+ tts_pipe = hf_pipeline(
65
+ "text-to-speech",
66
+ model=TTS_MODEL_ID,
67
+ device=DEVICE,
68
+ torch_dtype=TORCH_DTYPE,
69
+ )
70
+ print(" TTS loaded βœ“")
71
+ print("All models loaded!")
72
+
73
+
74
+ # =============================================================================
75
+ # Pipeline functions (from working Colab notebook)
76
+ # =============================================================================
77
+
78
+ def split_into_sentences(text):
79
+ """Split raw ASR text into individual sentences for MT."""
80
+ text = text.strip()
81
+ if not text:
82
+ return []
83
+
84
+ # Normalize case
85
+ text = '. '.join(s.strip().capitalize() for s in text.split('. ') if s.strip())
86
+
87
+ # If text has punctuation, split on it
88
+ if re.search(r'[.!?]', text):
89
+ sentences = re.split(r'(?<=[.!?])\s+', text)
90
+ return [s.strip() for s in sentences if s.strip()]
91
+
92
+ # No punctuation β€” split into ~12 word chunks
93
+ words = text.split()
94
+ MAX_WORDS = 12
95
+ sentences = []
96
+ for i in range(0, len(words), MAX_WORDS):
97
+ chunk = ' '.join(words[i:i + MAX_WORDS])
98
+ if not chunk.endswith(('.', '!', '?')):
99
+ chunk += '.'
100
+ chunk = chunk[0].upper() + chunk[1:] if len(chunk) > 1 else chunk.upper()
101
+ sentences.append(chunk)
102
+ return sentences
103
+
104
+
105
+ def transcribe(audio_array, sample_rate=16000):
106
+ """ASR: English audio β†’ English text."""
107
+ result = asr_pipe(
108
+ {"raw": audio_array, "sampling_rate": sample_rate},
109
+ chunk_length_s=15,
110
+ batch_size=1,
111
+ return_timestamps=False,
112
+ )
113
+ return result["text"].strip()
114
+
115
+
116
+ def translate_sentence(text, max_length=256):
117
+ """MT: Translate a single sentence from English to Yoruba."""
118
+ inputs = mt_tokenizer(text, return_tensors="pt", truncation=True).to(DEVICE)
119
+ tgt_lang_id = mt_tokenizer.convert_tokens_to_ids(MT_TGT_LANG)
120
+
121
+ with torch.no_grad():
122
+ output_ids = mt_model.generate(
123
+ **inputs,
124
+ max_length=max_length,
125
+ forced_bos_token_id=tgt_lang_id,
126
+ repetition_penalty=1.5,
127
+ no_repeat_ngram_size=3,
128
+ num_beams=4,
129
+ early_stopping=True,
130
+ )
131
+ return mt_tokenizer.decode(output_ids[0], skip_special_tokens=True)
132
+
133
+
134
+ def translate_long_text(text):
135
+ """Split into sentences and translate each individually."""
136
+ sentences = split_into_sentences(text)
137
+ translations = []
138
+ for sent in sentences:
139
+ yo = translate_sentence(sent)
140
+ translations.append(yo)
141
+ return ' '.join(translations), sentences, translations
142
+
143
+
144
+ def synthesize(text):
145
+ """TTS: Yoruba text β†’ audio."""
146
+ result = tts_pipe(text)
147
+ audio = np.array(result["audio"]).squeeze()
148
+ sr = result["sampling_rate"]
149
+ return audio, sr
150
+
151
+
152
+ # =============================================================================
153
+ # Gradio interface functions
154
+ # =============================================================================
155
+
156
+ def process_audio(audio_input):
157
+ """
158
+ Full pipeline: English audio β†’ Yoruba audio.
159
+ audio_input: tuple of (sample_rate, numpy_array) from Gradio.
160
+ """
161
+ if audio_input is None:
162
+ return None, "⚠️ No audio provided. Please upload or record audio."
163
+
164
+ sample_rate, audio_array = audio_input
165
+
166
+ # Convert to float32 mono if needed
167
+ audio_array = audio_array.astype(np.float32)
168
+ if audio_array.ndim > 1:
169
+ audio_array = audio_array.mean(axis=1)
170
+
171
+ # Normalize to [-1, 1] if integer audio
172
+ if audio_array.max() > 1.0 or audio_array.min() < -1.0:
173
+ audio_array = audio_array / max(abs(audio_array.max()), abs(audio_array.min()))
174
+
175
+ total_start = time.time()
176
+ log_lines = []
177
+
178
+ # Step 1: ASR
179
+ t0 = time.time()
180
+ english_text = transcribe(audio_array, sample_rate)
181
+ asr_time = time.time() - t0
182
+ log_lines.append(f"**🎀 ASR** ({asr_time:.2f}s)")
183
+ log_lines.append(f"English: {english_text}")
184
+ log_lines.append("")
185
+
186
+ if not english_text:
187
+ return None, "⚠️ ASR returned empty text. Please try with clearer audio."
188
+
189
+ # Step 2: MT (sentence by sentence)
190
+ t0 = time.time()
191
+ yoruba_text, en_sentences, yo_sentences = translate_long_text(english_text)
192
+ mt_time = time.time() - t0
193
+ log_lines.append(f"**πŸ”„ Translation** ({mt_time:.2f}s)")
194
+ for en_s, yo_s in zip(en_sentences, yo_sentences):
195
+ log_lines.append(f" EN: {en_s}")
196
+ log_lines.append(f" YO: {yo_s}")
197
+ log_lines.append("")
198
+
199
+ if not yoruba_text:
200
+ return None, "⚠️ Translation returned empty text."
201
+
202
+ # Step 3: TTS
203
+ t0 = time.time()
204
+ yoruba_audio, output_sr = synthesize(yoruba_text)
205
+ tts_time = time.time() - t0
206
+ log_lines.append(f"**πŸ”Š TTS** ({tts_time:.2f}s) β†’ {len(yoruba_audio)/output_sr:.2f}s of audio")
207
+
208
+ total = time.time() - total_start
209
+ log_lines.append("")
210
+ log_lines.append(f"**Total: {total:.2f}s**")
211
+
212
+ log_output = "\n".join(log_lines)
213
+
214
+ return (output_sr, yoruba_audio), log_output
215
+
216
+
217
+ def process_text(english_text):
218
+ """
219
+ Text-only mode: English text β†’ Yoruba text + audio.
220
+ Skips the ASR stage β€” useful for testing MT + TTS.
221
+ """
222
+ if not english_text or not english_text.strip():
223
+ return None, "⚠️ Please enter some English text."
224
+
225
+ total_start = time.time()
226
+ log_lines = []
227
+
228
+ # MT
229
+ t0 = time.time()
230
+ yoruba_text, en_sentences, yo_sentences = translate_long_text(english_text.strip())
231
+ mt_time = time.time() - t0
232
+ log_lines.append(f"**πŸ”„ Translation** ({mt_time:.2f}s)")
233
+ for en_s, yo_s in zip(en_sentences, yo_sentences):
234
+ log_lines.append(f" EN: {en_s}")
235
+ log_lines.append(f" YO: {yo_s}")
236
+ log_lines.append("")
237
+
238
+ if not yoruba_text:
239
+ return None, "⚠️ Translation returned empty text."
240
+
241
+ # TTS
242
+ t0 = time.time()
243
+ yoruba_audio, output_sr = synthesize(yoruba_text)
244
+ tts_time = time.time() - t0
245
+ log_lines.append(f"**πŸ”Š TTS** ({tts_time:.2f}s) β†’ {len(yoruba_audio)/output_sr:.2f}s of audio")
246
+
247
+ total = time.time() - total_start
248
+ log_lines.append("")
249
+ log_lines.append(f"**Total: {total:.2f}s**")
250
+
251
+ return (output_sr, yoruba_audio), "\n".join(log_lines)
252
+
253
+
254
+ # =============================================================================
255
+ # Gradio UI
256
+ # =============================================================================
257
+
258
+ DESCRIPTION = """
259
+ # 🏟️ Live Football Commentary β€” English β†’ Yoruba
260
+
261
+ Translate English football commentary into Yoruba speech in real-time.
262
+
263
+ **Pipeline:** ASR (Whisper) β†’ MT (NLLB-200) β†’ TTS (MMS-TTS Yoruba)
264
+
265
+ Upload or record English commentary audio, and get back Yoruba audio + full transcript.
266
+ """
267
+
268
+ EXAMPLES_TEXT = [
269
+ "And it's a brilliant goal from the striker!",
270
+ "The referee has shown a yellow card. Corner kick for the home team.",
271
+ "What a save by the goalkeeper! The match is heading into injury time.",
272
+ "He dribbles past two defenders and shoots! The ball hits the back of the net!",
273
+ ]
274
+
275
+ with gr.Blocks(
276
+ title="Football Commentary EN→YO",
277
+ theme=gr.themes.Soft(),
278
+ ) as demo:
279
+
280
+ gr.Markdown(DESCRIPTION)
281
+
282
+ with gr.Tabs():
283
+
284
+ # ---- Tab 1: Audio β†’ Audio (Full Pipeline) ----
285
+ with gr.TabItem("πŸŽ™οΈ Audio β†’ Audio (Full Pipeline)"):
286
+ gr.Markdown("Upload or record English commentary. The pipeline will transcribe, translate, and synthesize Yoruba audio.")
287
+
288
+ with gr.Row():
289
+ with gr.Column():
290
+ audio_input = gr.Audio(
291
+ label="English Commentary Audio",
292
+ type="numpy",
293
+ sources=["upload", "microphone"],
294
+ )
295
+ audio_submit_btn = gr.Button("Translate to Yoruba", variant="primary", size="lg")
296
+
297
+ with gr.Column():
298
+ audio_output = gr.Audio(label="Yoruba Commentary Audio", type="numpy")
299
+ audio_log = gr.Markdown(label="Pipeline Log")
300
+
301
+ audio_submit_btn.click(
302
+ fn=process_audio,
303
+ inputs=[audio_input],
304
+ outputs=[audio_output, audio_log],
305
+ )
306
+
307
+ # ---- Tab 2: Text β†’ Audio (Skip ASR) ----
308
+ with gr.TabItem("πŸ“ Text β†’ Audio (Translation + TTS)"):
309
+ gr.Markdown("Type or paste English text to translate to Yoruba and hear the result. Useful for testing without audio.")
310
+
311
+ with gr.Row():
312
+ with gr.Column():
313
+ text_input = gr.Textbox(
314
+ label="English Text",
315
+ placeholder="Type English football commentary here...",
316
+ lines=4,
317
+ )
318
+ text_submit_btn = gr.Button("Translate to Yoruba", variant="primary", size="lg")
319
+
320
+ gr.Examples(
321
+ examples=[[e] for e in EXAMPLES_TEXT],
322
+ inputs=[text_input],
323
+ label="Example Commentary",
324
+ )
325
+
326
+ with gr.Column():
327
+ text_audio_output = gr.Audio(label="Yoruba Audio", type="numpy")
328
+ text_log = gr.Markdown(label="Pipeline Log")
329
+
330
+ text_submit_btn.click(
331
+ fn=process_text,
332
+ inputs=[text_input],
333
+ outputs=[text_audio_output, text_log],
334
+ )
335
+
336
+ gr.Markdown("""
337
+ ---
338
+ **Models used:**
339
+ [ASR: PlotweaverAI/whisper-small-de-en](https://huggingface.co/PlotweaverAI/whisper-small-de-en) |
340
+ [MT: PlotweaverAI/nllb-200-distilled-600M-african-6lang](https://huggingface.co/PlotweaverAI/nllb-200-distilled-600M-african-6lang) |
341
+ [TTS: PlotweaverAI/yoruba-mms-tts-new](https://huggingface.co/PlotweaverAI/yoruba-mms-tts-new)
342
+ """)
343
+
344
+ # Launch
345
+ if __name__ == "__main__":
346
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ torch>=2.0.0
2
+ transformers>=4.36.0
3
+ accelerate>=0.25.0
4
+ soundfile>=0.12.0
5
+ numpy>=1.24.0
6
+ gradio>=4.0.0