Spaces:

PlotweaverModel
/

Live-Football-Commentary

Running

App Files Files Community

PlotweaverModel commited on 17 days ago

Commit

bf774ca

verified ·

1 Parent(s): 27a4574

Files upload

Browse files

Files changed (4) hide show

DEPLOY.md +94 -0
README.md +22 -9
app.py +346 -0
requirements.txt +6 -0

DEPLOY.md ADDED Viewed

	@@ -0,0 +1,94 @@

+# Deployment Guide — HuggingFace Space
+## Quick Deploy (3 steps)
+### Step 1: Create a new HuggingFace Space
+1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
+2. Fill in:
+   - **Owner**: `PlotweaverAI` (or your account)
+   - **Space name**: `live-football-commentary-en-yo`
+   - **SDK**: Gradio
+   - **Hardware**: **T4 small** (GPU required — free tier CPU won't work well)
+   - **Visibility**: Public
+3. Click **Create Space**
+### Step 2: Upload the files
+Upload these 3 files to the Space repo (via the web UI or git):
+```
+├── README.md           ← Space metadata (hardware, tags, etc.)
+├── app.py              ← Main Gradio application
+└── requirements.txt    ← Python dependencies
+```
+**Option A — Web upload:**
+- Go to your Space → Files → "Add file" → Upload each file
+**Option B — Git (recommended):**
+```bash
+# Clone the space
+git clone https://huggingface.co/spaces/PlotweaverAI/live-football-commentary-en-yo
+cd live-football-commentary-en-yo
+# Copy the files
+cp /path/to/hf_space/* .
+# Push
+git add .
+git commit -m "Initial deploy: EN→YO commentary pipeline"
+git push
+```
+### Step 3: Wait for build
+The Space will automatically:
+1. Install dependencies from `requirements.txt`
+2. Download all 3 models from HuggingFace Hub
+3. Start the Gradio app
+First build takes ~5-10 minutes (model downloads). Subsequent restarts are faster due to caching.
+---
+## Hardware Notes
+| Hardware | Cost | Performance |
+|----------|------|-------------|
+| T4 small | ~$0.60/hr | Good — full pipeline in ~6-10s |
+| T4 medium | ~$1.00/hr | Better for concurrent users |
+| A10G small | ~$1.05/hr | Fastest inference |
+| CPU basic | Free | Very slow (~60s+), not recommended |
+The Space will **sleep after 48 hours of inactivity** on paid hardware.
+You can enable "persistent" mode in Space settings to keep it running.
+---
+## Troubleshooting
+**Space keeps crashing / OOM:**
+- T4 small has 16GB VRAM — should be enough for all 3 models in float16
+- If issues persist, try T4 medium
+**Models fail to load:**
+- Make sure all 3 model repos are **public** on HuggingFace
+- If private, add a `HF_TOKEN` secret in Space settings
+**Audio recording doesn't work:**
+- Browser mic access requires HTTPS (HuggingFace Spaces provides this)
+- Make sure you've granted microphone permission in the browser
+---
+## Customization
+**To add more source/target languages** (your MT model supports 6):
+Edit `app.py` and add a language dropdown to the Gradio UI.
+Your NLLB model likely supports these codes:
+- `eng_Latn` (English)
+- `yor_Latn` (Yoruba)
+- `ibo_Latn` (Igbo)
+- `hau_Latn` (Hausa)
+- Check your model card for the full list.

README.md CHANGED Viewed

@@ -1,13 +1,26 @@
 ---
-title: Live Football Commentary
-emoji: 📉
-colorFrom: red
-colorTo: red
 sdk: gradio
-sdk_version: 6.10.0
 app_file: app.py
-pinned: false
-license: apache-2.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Live Football Commentary - English to Yoruba
+emoji: 🏟️
+colorFrom: green
+colorTo: yellow
 sdk: gradio
+sdk_version: "4.44.1"
 app_file: app.py
+pinned: true
+license: mit
+hardware: t4-small
+models:
+  - PlotweaverAI/whisper-small-de-en
+  - PlotweaverAI/nllb-200-distilled-600M-african-6lang
+  - PlotweaverAI/yoruba-mms-tts-new
+tags:
+  - speech-to-speech
+  - translation
+  - yoruba
+  - football
+  - commentary
+  - asr
+  - tts
+  - nllb
+short_description: Translate live English football commentary to Yoruba speech
 ---

app.py ADDED Viewed

	@@ -0,0 +1,346 @@

+"""
+Live Football Commentary Pipeline — English → Yoruba
+=====================================================
+Gradio app for HuggingFace Spaces.
+Pipeline: ASR (Whisper) → MT (NLLB-200) → TTS (MMS-TTS Yoruba)
+"""
+import torch
+import numpy as np
+import re
+import time
+import gradio as gr
+from transformers import (
+    pipeline as hf_pipeline,
+    AutoTokenizer,
+    AutoModelForSeq2SeqLM,
+)
+# =============================================================================
+# Configuration
+# =============================================================================
+ASR_MODEL_ID = "PlotweaverAI/whisper-small-de-en"
+MT_MODEL_ID = "PlotweaverAI/nllb-200-distilled-600M-african-6lang"
+TTS_MODEL_ID = "PlotweaverAI/yoruba-mms-tts-new"
+MT_SRC_LANG = "eng_Latn"
+MT_TGT_LANG = "yor_Latn"
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+TORCH_DTYPE = torch.float16 if torch.cuda.is_available() else torch.float32
+# =============================================================================
+# Load models (runs once at startup)
+# =============================================================================
+print(f"Device: {DEVICE} | Dtype: {TORCH_DTYPE}")
+print("Loading models...")
+# ASR
+print(f"  Loading ASR: {ASR_MODEL_ID}")
+asr_pipe = hf_pipeline(
+    "automatic-speech-recognition",
+    model=ASR_MODEL_ID,
+    device=DEVICE,
+    torch_dtype=TORCH_DTYPE,
+)
+print("  ASR loaded ✓")
+# MT
+print(f"  Loading MT: {MT_MODEL_ID}")
+mt_tokenizer = AutoTokenizer.from_pretrained(MT_MODEL_ID)
+mt_model = AutoModelForSeq2SeqLM.from_pretrained(
+    MT_MODEL_ID,
+    torch_dtype=TORCH_DTYPE,
+).to(DEVICE)
+mt_tokenizer.src_lang = MT_SRC_LANG
+print("  MT loaded ✓")
+# TTS
+print(f"  Loading TTS: {TTS_MODEL_ID}")
+tts_pipe = hf_pipeline(
+    "text-to-speech",
+    model=TTS_MODEL_ID,
+    device=DEVICE,
+    torch_dtype=TORCH_DTYPE,
+)
+print("  TTS loaded ✓")
+print("All models loaded!")
+# =============================================================================
+# Pipeline functions (from working Colab notebook)
+# =============================================================================
+def split_into_sentences(text):
+    """Split raw ASR text into individual sentences for MT."""
+    text = text.strip()
+    if not text:
+        return []
+    # Normalize case
+    text = '. '.join(s.strip().capitalize() for s in text.split('. ') if s.strip())
+    # If text has punctuation, split on it
+    if re.search(r'[.!?]', text):
+        sentences = re.split(r'(?<=[.!?])\s+', text)
+        return [s.strip() for s in sentences if s.strip()]
+    # No punctuation — split into ~12 word chunks
+    words = text.split()
+    MAX_WORDS = 12
+    sentences = []
+    for i in range(0, len(words), MAX_WORDS):
+        chunk = ' '.join(words[i:i + MAX_WORDS])
+        if not chunk.endswith(('.', '!', '?')):
+            chunk += '.'
+        chunk = chunk[0].upper() + chunk[1:] if len(chunk) > 1 else chunk.upper()
+        sentences.append(chunk)
+    return sentences
+def transcribe(audio_array, sample_rate=16000):
+    """ASR: English audio → English text."""
+    result = asr_pipe(
+        {"raw": audio_array, "sampling_rate": sample_rate},
+        chunk_length_s=15,
+        batch_size=1,
+        return_timestamps=False,
+    )
+    return result["text"].strip()
+def translate_sentence(text, max_length=256):
+    """MT: Translate a single sentence from English to Yoruba."""
+    inputs = mt_tokenizer(text, return_tensors="pt", truncation=True).to(DEVICE)
+    tgt_lang_id = mt_tokenizer.convert_tokens_to_ids(MT_TGT_LANG)
+    with torch.no_grad():
+        output_ids = mt_model.generate(
+            **inputs,
+            max_length=max_length,
+            forced_bos_token_id=tgt_lang_id,
+            repetition_penalty=1.5,
+            no_repeat_ngram_size=3,
+            num_beams=4,
+            early_stopping=True,
+        )
+    return mt_tokenizer.decode(output_ids[0], skip_special_tokens=True)
+def translate_long_text(text):
+    """Split into sentences and translate each individually."""
+    sentences = split_into_sentences(text)
+    translations = []
+    for sent in sentences:
+        yo = translate_sentence(sent)
+        translations.append(yo)
+    return ' '.join(translations), sentences, translations
+def synthesize(text):
+    """TTS: Yoruba text → audio."""
+    result = tts_pipe(text)
+    audio = np.array(result["audio"]).squeeze()
+    sr = result["sampling_rate"]
+    return audio, sr
+# =============================================================================
+# Gradio interface functions
+# =============================================================================
+def process_audio(audio_input):
+    """
+    Full pipeline: English audio → Yoruba audio.
+    audio_input: tuple of (sample_rate, numpy_array) from Gradio.
+    """
+    if audio_input is None:
+        return None, "⚠️ No audio provided. Please upload or record audio."
+    sample_rate, audio_array = audio_input
+    # Convert to float32 mono if needed
+    audio_array = audio_array.astype(np.float32)
+    if audio_array.ndim > 1:
+        audio_array = audio_array.mean(axis=1)
+    # Normalize to [-1, 1] if integer audio
+    if audio_array.max() > 1.0 or audio_array.min() < -1.0:
+        audio_array = audio_array / max(abs(audio_array.max()), abs(audio_array.min()))
+    total_start = time.time()
+    log_lines = []
+    # Step 1: ASR
+    t0 = time.time()
+    english_text = transcribe(audio_array, sample_rate)
+    asr_time = time.time() - t0
+    log_lines.append(f"**🎤 ASR** ({asr_time:.2f}s)")
+    log_lines.append(f"English: {english_text}")
+    log_lines.append("")
+    if not english_text:
+        return None, "⚠️ ASR returned empty text. Please try with clearer audio."
+    # Step 2: MT (sentence by sentence)
+    t0 = time.time()
+    yoruba_text, en_sentences, yo_sentences = translate_long_text(english_text)
+    mt_time = time.time() - t0
+    log_lines.append(f"**🔄 Translation** ({mt_time:.2f}s)")
+    for en_s, yo_s in zip(en_sentences, yo_sentences):
+        log_lines.append(f"  EN: {en_s}")
+        log_lines.append(f"  YO: {yo_s}")
+    log_lines.append("")
+    if not yoruba_text:
+        return None, "⚠️ Translation returned empty text."
+    # Step 3: TTS
+    t0 = time.time()
+    yoruba_audio, output_sr = synthesize(yoruba_text)
+    tts_time = time.time() - t0
+    log_lines.append(f"**🔊 TTS** ({tts_time:.2f}s) → {len(yoruba_audio)/output_sr:.2f}s of audio")
+    total = time.time() - total_start
+    log_lines.append("")
+    log_lines.append(f"**Total: {total:.2f}s**")
+    log_output = "\n".join(log_lines)
+    return (output_sr, yoruba_audio), log_output
+def process_text(english_text):
+    """
+    Text-only mode: English text → Yoruba text + audio.
+    Skips the ASR stage — useful for testing MT + TTS.
+    """
+    if not english_text or not english_text.strip():
+        return None, "⚠️ Please enter some English text."
+    total_start = time.time()
+    log_lines = []
+    # MT
+    t0 = time.time()
+    yoruba_text, en_sentences, yo_sentences = translate_long_text(english_text.strip())
+    mt_time = time.time() - t0
+    log_lines.append(f"**🔄 Translation** ({mt_time:.2f}s)")
+    for en_s, yo_s in zip(en_sentences, yo_sentences):
+        log_lines.append(f"  EN: {en_s}")
+        log_lines.append(f"  YO: {yo_s}")
+    log_lines.append("")
+    if not yoruba_text:
+        return None, "⚠️ Translation returned empty text."
+    # TTS
+    t0 = time.time()
+    yoruba_audio, output_sr = synthesize(yoruba_text)
+    tts_time = time.time() - t0
+    log_lines.append(f"**🔊 TTS** ({tts_time:.2f}s) → {len(yoruba_audio)/output_sr:.2f}s of audio")
+    total = time.time() - total_start
+    log_lines.append("")
+    log_lines.append(f"**Total: {total:.2f}s**")
+    return (output_sr, yoruba_audio), "\n".join(log_lines)
+# =============================================================================
+# Gradio UI
+# =============================================================================
+DESCRIPTION = """
+# 🏟️ Live Football Commentary — English → Yoruba
+Translate English football commentary into Yoruba speech in real-time.
+**Pipeline:** ASR (Whisper) → MT (NLLB-200) → TTS (MMS-TTS Yoruba)
+Upload or record English commentary audio, and get back Yoruba audio + full transcript.
+"""
+EXAMPLES_TEXT = [
+    "And it's a brilliant goal from the striker!",
+    "The referee has shown a yellow card. Corner kick for the home team.",
+    "What a save by the goalkeeper! The match is heading into injury time.",
+    "He dribbles past two defenders and shoots! The ball hits the back of the net!",
+]
+with gr.Blocks(
+    title="Football Commentary EN→YO",
+    theme=gr.themes.Soft(),
+) as demo:
+    gr.Markdown(DESCRIPTION)
+    with gr.Tabs():
+        # ---- Tab 1: Audio → Audio (Full Pipeline) ----
+        with gr.TabItem("🎙️ Audio → Audio (Full Pipeline)"):
+            gr.Markdown("Upload or record English commentary. The pipeline will transcribe, translate, and synthesize Yoruba audio.")
+            with gr.Row():
+                with gr.Column():
+                    audio_input = gr.Audio(
+                        label="English Commentary Audio",
+                        type="numpy",
+                        sources=["upload", "microphone"],
+                    )
+                    audio_submit_btn = gr.Button("Translate to Yoruba", variant="primary", size="lg")
+                with gr.Column():
+                    audio_output = gr.Audio(label="Yoruba Commentary Audio", type="numpy")
+                    audio_log = gr.Markdown(label="Pipeline Log")
+            audio_submit_btn.click(
+                fn=process_audio,
+                inputs=[audio_input],
+                outputs=[audio_output, audio_log],
+            )
+        # ---- Tab 2: Text → Audio (Skip ASR) ----
+        with gr.TabItem("📝 Text → Audio (Translation + TTS)"):
+            gr.Markdown("Type or paste English text to translate to Yoruba and hear the result. Useful for testing without audio.")
+            with gr.Row():
+                with gr.Column():
+                    text_input = gr.Textbox(
+                        label="English Text",
+                        placeholder="Type English football commentary here...",
+                        lines=4,
+                    )
+                    text_submit_btn = gr.Button("Translate to Yoruba", variant="primary", size="lg")
+                    gr.Examples(
+                        examples=[[e] for e in EXAMPLES_TEXT],
+                        inputs=[text_input],
+                        label="Example Commentary",
+                    )
+                with gr.Column():
+                    text_audio_output = gr.Audio(label="Yoruba Audio", type="numpy")
+                    text_log = gr.Markdown(label="Pipeline Log")
+            text_submit_btn.click(
+                fn=process_text,
+                inputs=[text_input],
+                outputs=[text_audio_output, text_log],
+            )
+    gr.Markdown("""
+---
+**Models used:**
+[ASR: PlotweaverAI/whisper-small-de-en](https://huggingface.co/PlotweaverAI/whisper-small-de-en) |
+[MT: PlotweaverAI/nllb-200-distilled-600M-african-6lang](https://huggingface.co/PlotweaverAI/nllb-200-distilled-600M-african-6lang) |
+[TTS: PlotweaverAI/yoruba-mms-tts-new](https://huggingface.co/PlotweaverAI/yoruba-mms-tts-new)
+""")
+# Launch
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+torch>=2.0.0
+transformers>=4.36.0
+accelerate>=0.25.0
+soundfile>=0.12.0
+numpy>=1.24.0
+gradio>=4.0.0