Spaces:

Cyberlace
/

latihan-artikulasi

Running on Zero

App Files Files Community

fariedalfarizi commited on 7 days ago

Commit

4aa23ee

1 Parent(s): ef08a8e

Deploy Vocal Articulation Assessment v2.0

Browse files

Files changed (16) hide show

.env.example +10 -0
.gitignore +74 -0
Dockerfile +33 -0
README.md +151 -6
api/__init__.py +4 -0
api/routes.py +346 -0
app.py +319 -0
app/__init__.py +4 -0
app/interface.py +351 -0
config/__init__.py +4 -0
config/settings.py +30 -0
core/__init__.py +9 -0
core/constants.py +94 -0
core/scoring_engine.py +638 -0
requirements.txt +38 -0
start.sh +26 -0

.env.example ADDED Viewed

	@@ -0,0 +1,10 @@

+# Model configuration
+WHISPER_MODEL=openai/whisper-small
+# Server configuration
+HOST=0.0.0.0
+PORT=7860
+# Gradio configuration
+GRADIO_SERVER_NAME=0.0.0.0
+GRADIO_SERVER_PORT=7860

.gitignore ADDED Viewed

	@@ -0,0 +1,74 @@

+# .gitignore untuk Vocal Articulation Project
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual Environment
+venv/
+env/
+ENV/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Jupyter Notebook
+.ipynb_checkpoints
+# Model files (jika besar)
+# model_vokal/*.bin
+# model_vokal/*.safetensors
+# Audio files
+*.wav
+*.mp3
+*.m4a
+*.flac
+*.ogg
+!examples/*.wav
+# Temporary files
+*.tmp
+*.temp
+tmp/
+temp/
+# Logs
+*.log
+logs/
+# OS
+.DS_Store
+Thumbs.db
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+# Environment variables
+.env
+.env.local

Dockerfile ADDED Viewed

	@@ -0,0 +1,33 @@

+# =======================================
+# DOCKERFILE - For Space Docker SDK
+# =======================================
+FROM python:3.10-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    libsndfile1 \
+    ffmpeg \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Expose port
+EXPOSE 7860
+# Set environment variables
+ENV PYTHONUNBUFFERED=1
+ENV GRADIO_SERVER_NAME="0.0.0.0"
+ENV GRADIO_SERVER_PORT=7860
+# Run application
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,13 +1,158 @@
 ---
-title: Latihan Artikulasi
-emoji: 📉
-colorFrom: pink
-colorTo: gray
 sdk: gradio
-sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Vocal Articulation Assessment
+emoji: 🎤
+colorFrom: purple
+colorTo: pink
 sdk: gradio
+sdk_version: 4.0.0
 app_file: app.py
 pinned: false
 license: mit
 ---
+# 🎤 Sistem Penilaian Vokal Indonesia
+Sistem penilaian artikulasi vokal bahasa Indonesia menggunakan deep learning dan audio signal processing.
+## 🌟 Fitur
+### Multi-Metric Assessment
+1. **Clarity Score (40%)**: Kejelasan pengucapan berdasarkan model confidence
+2. **Energy Score (25%)**: Kualitas volume dan energi suara
+3. **Duration Score (15%)**: Kesesuaian durasi pengucapan
+4. **Pitch Score (20%)**: Stabilitas pitch/nada suara
+### Vokal yang Didukung
+- **A** - Vokal terbuka depan
+- **I** - Vokal tertutup depan
+- **U** - Vokal tertutup belakang
+- **E** - Vokal tengah depan
+- **O** - Vokal tengah belakang
+## 🚀 Cara Menggunakan
+### Di HuggingFace Spaces
+1. Upload atau record audio Anda
+2. Pilih target vokal (A, I, U, E, O)
+3. (Optional) Set expected duration
+4. Klik "Nilai Pengucapan"
+5. Lihat hasil penilaian dengan grade dan feedback
+### Local Development
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run Gradio App
+python app.py
+# Or run FastAPI server
+python api.py
+```
+## 📊 Sistem Penilaian
+| Grade | Score Range | Keterangan                                         |
+| ----- | ----------- | -------------------------------------------------- |
+| A     | 90-100      | Sempurna - pengucapan sangat jelas dan akurat      |
+| B     | 80-89       | Bagus - pengucapan cukup jelas dengan minor errors |
+| C     | 70-79       | Cukup - ada beberapa kesalahan                     |
+| D     | 60-69       | Kurang - banyak kesalahan                          |
+| E     | <60         | Perlu latihan lebih banyak                         |
+## 🔧 Teknologi
+- **Model**: HuBERT/Wav2Vec2 fine-tuned untuk klasifikasi vokal Indonesia
+- **Backend**: FastAPI
+- **Frontend**: Gradio
+- **Audio Processing**: librosa, torchaudio
+- **Deployment**: HuggingFace Spaces with ZeroGPU
+## 📁 Struktur Project
+```
+.
+├── app.py                 # Gradio interface (HF Spaces)
+├── api.py                 # FastAPI server
+├── scoring_system.py      # Core scoring logic
+├── latihan_dasar.py       # Advanced articulation system
+├── model_vokal/           # Model checkpoint
+│   ├── config.json
+│   ├── model.safetensors
+│   └── preprocessor_config.json
+├── requirements.txt       # Dependencies
+└── README.md             # Documentation
+```
+## 🎯 Roadmap
+### Level 1: Pengenalan Vokal ✅
+- A, I, U, E, O (Current)
+### Level 2-5: Expansi (Coming Soon)
+- Level 2: Konsonan Dasar (BA, PA, DA, TA, dll)
+- Level 3: Kombinasi Suku Kata (BA-BE-BI-BO-BU, dll)
+- Level 4: Kata Sulit (PSIKOLOGI, STRATEGI, dll)
+- Level 5: Kalimat Kompleks
+## 📝 API Documentation
+### FastAPI Endpoints
+```bash
+# Health check
+GET /health
+# Get supported labels
+GET /labels
+# Score single audio
+POST /score
+- audio: file (required)
+- target_label: string (optional)
+- expected_duration: float (optional)
+# Batch scoring
+POST /batch_score
+- audios: files (required)
+- target_labels: string (optional, comma-separated)
+```
+### Example cURL
+```bash
+curl -X POST "http://localhost:8000/score" \
+  -F "audio=@test.wav" \
+  -F "target_label=a" \
+  -F "expected_duration=0.8"
+```
+## 🤝 Contributing
+Contributions are welcome! Terutama untuk:
+- Menambah dataset vokal
+- Implementasi Level 2-5
+- Optimasi model
+- UI/UX improvements
+## 📄 License
+MIT License
+## 👥 Author
+Dibuat untuk Latihan Dasar Artikulasi Vokal Indonesia
+## 🙏 Acknowledgments
+- Model base: HuBERT/Wav2Vec2
+- Audio processing: librosa
+- Framework: FastAPI & Gradio
+- Deployment: HuggingFace Spaces

api/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+# API module
+from .routes import app
+__all__ = ['app']

api/routes.py ADDED Viewed

	@@ -0,0 +1,346 @@

+# =======================================
+# FASTAPI BACKEND - VOCAL ARTICULATION API V2
+# Updated untuk Whisper ASR + Multi-Level Support
+# =======================================
+from fastapi import FastAPI, File, UploadFile, Form, HTTPException
+from fastapi.responses import JSONResponse
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel
+from typing import Optional, List
+import tempfile
+import os
+from pathlib import Path
+from core.scoring_engine import AdvancedVocalScoringSystem, ScoreResult
+from core.constants import ARTICULATION_LEVELS
+# =======================================
+# FASTAPI APP INITIALIZATION
+# =======================================
+app = FastAPI(
+    title="Vocal Articulation Assessment API v2",
+    description="API untuk penilaian artikulasi vokal Indonesia - Multi-level dengan Whisper ASR",
+    version="2.0.0"
+)
+# CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# =======================================
+# PYDANTIC MODELS
+# =======================================
+class ScoreResponse(BaseModel):
+    """Response model untuk scoring"""
+    success: bool
+    overall_score: float
+    grade: str
+    # Component scores
+    clarity_score: float
+    energy_score: float
+    speech_rate_score: float
+    pitch_consistency_score: float
+    snr_score: float
+    articulation_score: float
+    # ASR results
+    transcription: str
+    target: str
+    similarity: float
+    wer: float
+    # Feedback
+    feedback: str
+    suggestions: List[str]
+    # Audio features
+    audio_features: dict
+    level: int
+class HealthResponse(BaseModel):
+    """Response untuk health check"""
+    status: str
+    model_loaded: bool
+    device: str
+    whisper_model: str
+class LevelsResponse(BaseModel):
+    """Response untuk supported levels"""
+    levels: dict
+    total_levels: int
+# =======================================
+# GLOBAL VARIABLES
+# =======================================
+scorer: Optional[AdvancedVocalScoringSystem] = None
+# =======================================
+# STARTUP & SHUTDOWN
+# =======================================
+@app.on_event("startup")
+async def startup_event():
+    """Load model saat startup"""
+    global scorer
+    print("🚀 Starting Vocal Articulation API v2...")
+    # Whisper model dari environment atau default
+    whisper_model = os.getenv("WHISPER_MODEL", "openai/whisper-small")
+    try:
+        scorer = AdvancedVocalScoringSystem(whisper_model=whisper_model)
+        print("✅ Whisper model loaded successfully!")
+    except Exception as e:
+        print(f"❌ Error loading model: {e}")
+        raise
+@app.on_event("shutdown")
+async def shutdown_event():
+    """Cleanup saat shutdown"""
+    print("👋 Shutting down Vocal Articulation API v2...")
+# =======================================
+# API ENDPOINTS
+# =======================================
+@app.get("/", response_model=dict)
+async def root():
+    """Root endpoint"""
+    return {
+        "message": "Vocal Articulation Assessment API v2",
+        "version": "2.0.0",
+        "features": [
+            "Whisper ASR-based clarity scoring",
+            "Multi-level support (Level 1-5)",
+            "6 scoring metrics",
+            "Comprehensive audio analysis"
+        ],
+        "endpoints": {
+            "health": "/health",
+            "levels": "/levels",
+            "score": "/score",
+            "batch_score": "/batch_score",
+            "docs": "/docs"
+        }
+    }
+@app.get("/health", response_model=HealthResponse)
+async def health_check():
+    """Health check endpoint"""
+    return HealthResponse(
+        status="healthy" if scorer is not None else "unhealthy",
+        model_loaded=scorer is not None,
+        device=scorer.device if scorer else "unknown",
+        whisper_model="openai/whisper-small" if scorer else "not loaded"
+    )
+@app.get("/levels", response_model=LevelsResponse)
+async def get_levels():
+    """Get all articulation levels and their targets"""
+    return LevelsResponse(
+        levels=ARTICULATION_LEVELS,
+        total_levels=len(ARTICULATION_LEVELS)
+    )
+@app.post("/score", response_model=ScoreResponse)
+async def score_audio(
+    audio: UploadFile = File(..., description="Audio file (WAV, MP3, M4A, etc.)"),
+    target_text: str = Form(..., description="Target text yang seharusnya diucapkan"),
+    level: int = Form(1, description="Level artikulasi (1-5)")
+):
+    """
+    Score audio file untuk penilaian artikulasi vokal
+    Args:
+        audio: File audio yang akan dinilai
+        target_text: Text target yang seharusnya diucapkan
+        level: Level artikulasi (1=Vokal, 2=Konsonan, 3=Suku Kata, 4=Kata, 5=Kalimat)
+    Returns:
+        ScoreResponse dengan hasil penilaian lengkap
+    """
+    if scorer is None:
+        raise HTTPException(status_code=503, detail="Model not loaded")
+    # Validate level
+    if level not in ARTICULATION_LEVELS:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Invalid level. Must be 1-5. Available levels: {list(ARTICULATION_LEVELS.keys())}"
+        )
+    # Validate target text
+    if not target_text or not target_text.strip():
+        raise HTTPException(
+            status_code=400,
+            detail="target_text cannot be empty"
+        )
+    # Save uploaded file to temporary location
+    try:
+        with tempfile.NamedTemporaryFile(delete=False, suffix=Path(audio.filename).suffix) as tmp_file:
+            content = await audio.read()
+            tmp_file.write(content)
+            tmp_path = tmp_file.name
+        # Score audio
+        result = scorer.score_audio(
+            audio_path=tmp_path,
+            target_text=target_text,
+            level=level
+        )
+        # Clean up temp file
+        os.unlink(tmp_path)
+        # Return response
+        return ScoreResponse(
+            success=True,
+            overall_score=result.overall_score,
+            grade=result.grade,
+            clarity_score=result.clarity_score,
+            energy_score=result.energy_score,
+            speech_rate_score=result.speech_rate_score,
+            pitch_consistency_score=result.pitch_consistency_score,
+            snr_score=result.snr_score,
+            articulation_score=result.articulation_score,
+            transcription=result.transcription,
+            target=result.target,
+            similarity=result.similarity,
+            wer=result.wer,
+            feedback=result.feedback,
+            suggestions=result.suggestions,
+            audio_features=result.audio_features,
+            level=result.level
+        )
+    except Exception as e:
+        # Clean up temp file if exists
+        if 'tmp_path' in locals() and os.path.exists(tmp_path):
+            os.unlink(tmp_path)
+        raise HTTPException(status_code=500, detail=f"Error processing audio: {str(e)}")
+@app.post("/batch_score")
+async def batch_score_audio(
+    audios: List[UploadFile] = File(..., description="Multiple audio files"),
+    target_texts: str = Form(..., description="Comma-separated target texts"),
+    levels: str = Form("1", description="Comma-separated levels (default: 1 for all)")
+):
+    """
+    Score multiple audio files dalam satu request
+    Args:
+        audios: List of audio files
+        target_texts: Comma-separated target texts
+        levels: Comma-separated levels (optional, default 1 for all)
+    Returns:
+        List of score results
+    """
+    if scorer is None:
+        raise HTTPException(status_code=503, detail="Model not loaded")
+    # Parse target texts
+    targets = [t.strip() for t in target_texts.split(",")]
+    if len(targets) != len(audios):
+        raise HTTPException(
+            status_code=400,
+            detail="Number of target_texts must match number of audio files"
+        )
+    # Parse levels
+    level_list = [int(l.strip()) for l in levels.split(",")]
+    if len(level_list) == 1:
+        level_list = level_list * len(audios)
+    elif len(level_list) != len(audios):
+        raise HTTPException(
+            status_code=400,
+            detail="Number of levels must be 1 or match number of audio files"
+        )
+    results = []
+    for idx, (audio, target, level) in enumerate(zip(audios, targets, level_list)):
+        try:
+            # Save to temp file
+            with tempfile.NamedTemporaryFile(delete=False, suffix=Path(audio.filename).suffix) as tmp_file:
+                content = await audio.read()
+                tmp_file.write(content)
+                tmp_path = tmp_file.name
+            # Score
+            result = scorer.score_audio(
+                audio_path=tmp_path,
+                target_text=target,
+                level=level
+            )
+            # Clean up
+            os.unlink(tmp_path)
+            results.append({
+                "filename": audio.filename,
+                "success": True,
+                "overall_score": result.overall_score,
+                "grade": result.grade,
+                "clarity_score": result.clarity_score,
+                "energy_score": result.energy_score,
+                "speech_rate_score": result.speech_rate_score,
+                "pitch_consistency_score": result.pitch_consistency_score,
+                "snr_score": result.snr_score,
+                "articulation_score": result.articulation_score,
+                "transcription": result.transcription,
+                "target": result.target,
+                "similarity": result.similarity,
+                "wer": result.wer,
+                "feedback": result.feedback,
+                "suggestions": result.suggestions,
+                "audio_features": result.audio_features,
+                "level": result.level
+            })
+        except Exception as e:
+            if 'tmp_path' in locals() and os.path.exists(tmp_path):
+                os.unlink(tmp_path)
+            results.append({
+                "filename": audio.filename,
+                "success": False,
+                "error": str(e)
+            })
+    return {"results": results, "total": len(results)}
+# =======================================
+# RUN SERVER
+# =======================================
+if __name__ == "__main__":
+    import uvicorn
+    # Configuration
+    host = os.getenv("HOST", "0.0.0.0")
+    port = int(os.getenv("PORT", 8000))
+    print(f"🚀 Starting server on {host}:{port}")
+    print("📖 API Documentation: http://localhost:8000/docs")
+    uvicorn.run(
+        "api_v2:app",
+        host=host,
+        port=port,
+        reload=True,
+        log_level="info"
+    )

app.py ADDED Viewed

	@@ -0,0 +1,319 @@

+# =======================================
+# GRADIO INTERFACE - HUGGINGFACE SPACES
+# UI untuk Vocal Articulation Assessment
+# Support ZeroGPU untuk HuggingFace Spaces
+# =======================================
+import gradio as gr
+import torch
+import os
+from pathlib import Path
+from typing import Dict, Tuple
+from scoring_system import VocalScoringSystem
+# =======================================
+# ZEROGPU DECORATOR (untuk HuggingFace Spaces)
+# =======================================
+try:
+    import spaces
+    ZEROGPU_AVAILABLE = True
+    print("✅ ZeroGPU available")
+except ImportError:
+    ZEROGPU_AVAILABLE = False
+    print("⚠️ ZeroGPU not available (running locally)")
+    # Create dummy decorator
+    class spaces:
+        @staticmethod
+        def GPU(func):
+            return func
+# =======================================
+# GLOBAL VARIABLES
+# =======================================
+scorer = None
+# =======================================
+# INITIALIZATION
+# =======================================
+def initialize_model():
+    """Initialize scoring system"""
+    global scorer
+    if scorer is None:
+        model_path = os.getenv("MODEL_PATH", "./model_vokal")
+        print(f"🔄 Loading model from {model_path}...")
+        scorer = VocalScoringSystem(model_path=model_path)
+        print("✅ Model loaded!")
+    return scorer
+# =======================================
+# GRADIO INFERENCE FUNCTION
+# =======================================
+@spaces.GPU(duration=60)  # Reserve GPU for 60 seconds (jika di HF Spaces)
+def score_vocal(
+    audio_file: str,
+    target_label: str,
+    expected_duration: float
+) -> Tuple[str, str, Dict, str]:
+    """
+    Score vocal audio dengan Gradio interface
+    Args:
+        audio_file: Path to uploaded audio
+        target_label: Target vocal (a, i, u, e, o)
+        expected_duration: Expected duration in seconds
+    Returns:
+        Tuple of (score_display, feedback, details_dict, grade_display)
+    """
+    try:
+        # Initialize model
+        scorer = initialize_model()
+        # Validate input
+        if audio_file is None:
+            return "❌ Error", "Silakan upload file audio terlebih dahulu!", {}, ""
+        # Process target label
+        target = target_label.lower().strip() if target_label else None
+        exp_dur = expected_duration if expected_duration > 0 else None
+        # Score audio
+        result = scorer.score_audio(
+            audio_path=audio_file,
+            target_label=target,
+            expected_duration=exp_dur
+        )
+        # Format score display
+        score_display = f"""
+## 📊 Hasil Penilaian
+### Overall Score: {result.overall_score}/100
+### Grade: {result.grade}
+---
+### 🎯 Prediksi
+- **Target**: {result.target_label.upper() if result.target_label else 'Tidak ada'}
+- **Terdeteksi**: {result.predicted_label.upper()}
+- **Confidence**: {result.confidence}%
+---
+### 📈 Component Scores
+| Komponen | Score | Bobot |
+|----------|-------|-------|
+| 🔊 **Clarity** | {result.clarity_score}/100 | 40% |
+| ⚡ **Energy** | {result.energy_score}/100 | 25% |
+| ⏱️ **Duration** | {result.duration_score}/100 | 15% |
+| 🎵 **Pitch** | {result.pitch_score}/100 | 20% |
+"""
+        # Format feedback
+        feedback_display = f"""
+## 💬 Feedback
+{result.feedback}
+### 💡 Saran Perbaikan:
+"""
+        if result.suggestions:
+            for i, suggestion in enumerate(result.suggestions, 1):
+                feedback_display += f"\n{i}. {suggestion}"
+        else:
+            feedback_display += "\n✅ Tidak ada saran - pengucapan sudah sangat baik!"
+        # Details dictionary
+        details = {
+            "Overall Score": result.overall_score,
+            "Grade": result.grade,
+            "Predicted": result.predicted_label.upper(),
+            "Confidence": f"{result.confidence}%",
+            "Clarity Score": result.clarity_score,
+            "Energy Score": result.energy_score,
+            "Duration Score": result.duration_score,
+            "Pitch Score": result.pitch_score,
+            **result.audio_features
+        }
+        # Grade display with emoji
+        grade_emoji = {
+            'A': '🌟',
+            'B': '👍',
+            'C': '😊',
+            'D': '🤔',
+            'E': '💪'
+        }
+        grade_display = f"{grade_emoji.get(result.grade, '📊')} Grade {result.grade}"
+        return score_display, feedback_display, details, grade_display
+    except Exception as e:
+        error_msg = f"❌ Error: {str(e)}"
+        return error_msg, error_msg, {}, "Error"
+# =======================================
+# GRADIO UI
+# =======================================
+def create_interface():
+    """Create Gradio interface"""
+    # Custom CSS
+    custom_css = """
+    .gradio-container {
+        font-family: 'Arial', sans-serif;
+    }
+    .score-display {
+        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+        color: white;
+        padding: 20px;
+        border-radius: 10px;
+    }
+    """
+    # Create interface
+    with gr.Blocks(
+        title="Vocal Articulation Assessment",
+        theme=gr.themes.Soft(),
+        css=custom_css
+    ) as demo:
+        gr.Markdown("""
+        # 🎤 Sistem Penilaian Vokal Indonesia
+        Sistem ini menilai pengucapan vokal bahasa Indonesia (A, I, U, E, O) menggunakan multiple metrics:
+        - **Clarity**: Kejelasan pengucapan dari model confidence
+        - **Energy**: Kualitas volume dan energi suara
+        - **Duration**: Kesesuaian durasi pengucapan
+        - **Pitch**: Stabilitas pitch/nada suara
+        ### 📝 Cara Penggunaan:
+        1. Upload atau record audio Anda
+        2. Pilih target vokal yang diucapkan
+        3. (Opsional) Set expected duration
+        4. Klik "🎯 Nilai Pengucapan"
+        """)
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown("## 🎙️ Input")
+                # Audio input
+                audio_input = gr.Audio(
+                    label="Upload atau Record Audio",
+                    type="filepath",
+                    sources=["upload", "microphone"]
+                )
+                # Target label dropdown
+                target_input = gr.Dropdown(
+                    label="Target Vokal",
+                    choices=["a", "i", "u", "e", "o"],
+                    value="a",
+                    info="Pilih vokal yang Anda ucapkan"
+                )
+                # Expected duration slider
+                duration_input = gr.Slider(
+                    label="Expected Duration (detik)",
+                    minimum=0,
+                    maximum=3.0,
+                    value=0.8,
+                    step=0.1,
+                    info="0 = auto (tidak diperhitungkan)"
+                )
+                # Submit button
+                submit_btn = gr.Button(
+                    "🎯 Nilai Pengucapan",
+                    variant="primary",
+                    size="lg"
+                )
+            with gr.Column(scale=1):
+                gr.Markdown("## 📊 Hasil Penilaian")
+                # Grade display (large)
+                grade_output = gr.Markdown(
+                    "### Belum ada penilaian",
+                    elem_classes=["score-display"]
+                )
+                # Score display
+                score_output = gr.Markdown()
+        # Feedback row
+        with gr.Row():
+            feedback_output = gr.Markdown()
+        # Details accordion
+        with gr.Accordion("🔍 Detail Lengkap", open=False):
+            details_output = gr.JSON(label="Audio Features & Scores")
+        # Examples
+        gr.Markdown("## 📚 Contoh")
+        gr.Examples(
+            examples=[
+                ["examples/a.wav", "a", 0.8],
+                ["examples/i.wav", "i", 0.8],
+                ["examples/u.wav", "u", 0.8],
+                ["examples/e.wav", "e", 0.8],
+                ["examples/o.wav", "o", 0.8],
+            ],
+            inputs=[audio_input, target_input, duration_input],
+            label="Klik untuk mencoba contoh"
+        )
+        # Connect button to function
+        submit_btn.click(
+            fn=score_vocal,
+            inputs=[audio_input, target_input, duration_input],
+            outputs=[score_output, feedback_output, details_output, grade_output]
+        )
+        # Footer
+        gr.Markdown("""
+        ---
+        ### ℹ️ Informasi
+        **Tentang Penilaian:**
+        - **Grade A** (90-100): Sempurna - pengucapan sangat jelas dan akurat
+        - **Grade B** (80-89): Bagus - pengucapan cukup jelas dengan minor errors
+        - **Grade C** (70-79): Cukup - ada beberapa kesalahan
+        - **Grade D** (60-69): Kurang - banyak kesalahan
+        - **Grade E** (<60): Perlu latihan lebih banyak
+        **Model**: HuBERT/Wav2Vec2 untuk klasifikasi vokal Indonesia
+        **Dibuat untuk**: Latihan Dasar Artikulasi Vokal Indonesia
+        """)
+    return demo
+# =======================================
+# MAIN
+# =======================================
+if __name__ == "__main__":
+    # Initialize model at startup
+    initialize_model()
+    # Create and launch interface
+    demo = create_interface()
+    # Launch configuration
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False,  # Set True untuk public URL
+        show_error=True
+    )

app/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+# App module
+from .interface import create_interface, initialize_model
+__all__ = ['create_interface', 'initialize_model']

app/interface.py ADDED Viewed

	@@ -0,0 +1,351 @@

+# =======================================
+# GRADIO INTERFACE V2 - HUGGINGFACE SPACES
+# Updated untuk Whisper ASR + Multi-Level Support
+# =======================================
+import gradio as gr
+import os
+from typing import Dict, Tuple
+from core.scoring_engine import AdvancedVocalScoringSystem
+from core.constants import ARTICULATION_LEVELS
+# =======================================
+# ZEROGPU DECORATOR
+# =======================================
+try:
+    import spaces
+    ZEROGPU_AVAILABLE = True
+    print("✅ ZeroGPU available")
+except ImportError:
+    ZEROGPU_AVAILABLE = False
+    print("⚠️ ZeroGPU not available (running locally)")
+    class spaces:
+        @staticmethod
+        def GPU(func):
+            return func
+# =======================================
+# GLOBAL VARIABLES
+# =======================================
+scorer = None
+# =======================================
+# INITIALIZATION
+# =======================================
+def initialize_model():
+    """Initialize scoring system"""
+    global scorer
+    if scorer is None:
+        whisper_model = os.getenv("WHISPER_MODEL", "openai/whisper-small")
+        print(f"🔄 Loading Whisper model: {whisper_model}...")
+        scorer = AdvancedVocalScoringSystem(whisper_model=whisper_model)
+        print("✅ Model loaded!")
+    return scorer
+# =======================================
+# GRADIO INFERENCE FUNCTION
+# =======================================
+@spaces.GPU(duration=120)
+def score_vocal(
+    audio_file: str,
+    target_text: str,
+    level: int
+) -> Tuple[str, str, Dict, str]:
+    """
+    Score vocal audio dengan Gradio interface
+    Args:
+        audio_file: Path to uploaded audio
+        target_text: Target text yang seharusnya diucapkan
+        level: Level artikulasi (1-5)
+    Returns:
+        Tuple of (score_display, feedback, details_dict, grade_display)
+    """
+    try:
+        # Initialize model
+        scorer = initialize_model()
+        # Validate input
+        if audio_file is None:
+            return "❌ Error", "Silakan upload atau record audio terlebih dahulu!", {}, "❌"
+        if not target_text or not target_text.strip():
+            return "❌ Error", "Silakan masukkan target text!", {}, "❌"
+        # Score audio
+        result = scorer.score_audio(
+            audio_path=audio_file,
+            target_text=target_text,
+            level=level
+        )
+        # Format score display
+        score_display = f"""
+## 📊 Hasil Penilaian - Level {level}
+### Overall Score: **{result.overall_score}/100**
+### Grade: **{result.grade}**
+---
+### 🎯 ASR Transcription
+- **Target**: {result.target}
+- **Terdeteksi**: {result.transcription}
+- **Similarity**: {result.similarity*100:.2f}%
+- **WER**: {result.wer*100:.2f}%
+---
+### 📈 Component Scores
+| Komponen | Score | Status |
+|----------|-------|--------|
+| 🔊 **Clarity** (ASR Accuracy) | **{result.clarity_score:.1f}/100** | {'✅' if result.clarity_score >= 80 else '⚠️' if result.clarity_score >= 60 else '❌'} |
+| ⚡ **Energy** (Volume) | **{result.energy_score:.1f}/100** | {'✅' if result.energy_score >= 80 else '⚠️' if result.energy_score >= 60 else '❌'} |
+| 🗣️ **Speech Rate** | **{result.speech_rate_score:.1f}/100** | {'✅' if result.speech_rate_score >= 80 else '⚠️' if result.speech_rate_score >= 60 else '❌'} |
+| 🎵 **Pitch Consistency** | **{result.pitch_consistency_score:.1f}/100** | {'✅' if result.pitch_consistency_score >= 80 else '⚠️' if result.pitch_consistency_score >= 60 else '❌'} |
+| 📡 **SNR** (Noise Quality) | **{result.snr_score:.1f}/100** | {'✅' if result.snr_score >= 80 else '⚠️' if result.snr_score >= 60 else '❌'} |
+| 🎤 **Articulation** | **{result.articulation_score:.1f}/100** | {'✅' if result.articulation_score >= 80 else '⚠️' if result.articulation_score >= 60 else '❌'} |
+"""
+        # Format feedback
+        feedback_display = f"""
+## 💬 Feedback
+{result.feedback}
+---
+### 💡 Saran Perbaikan:
+"""
+        if result.suggestions:
+            for i, suggestion in enumerate(result.suggestions, 1):
+                feedback_display += f"\n{i}. {suggestion}"
+        else:
+            feedback_display += "\n✨ **Sempurna!** Tidak ada saran - pengucapan Anda sudah sangat baik!"
+        # Details dictionary
+        details = {
+            "📊 Overall": {
+                "Score": result.overall_score,
+                "Grade": result.grade,
+                "Level": level
+            },
+            "🎯 ASR Results": {
+                "Target": result.target,
+                "Transcription": result.transcription,
+                "Similarity": f"{result.similarity*100:.2f}%",
+                "WER": f"{result.wer*100:.2f}%"
+            },
+            "📈 Component Scores": {
+                "Clarity": result.clarity_score,
+                "Energy": result.energy_score,
+                "Speech Rate": result.speech_rate_score,
+                "Pitch Consistency": result.pitch_consistency_score,
+                "SNR": result.snr_score,
+                "Articulation": result.articulation_score
+            },
+            "🔊 Audio Features": result.audio_features
+        }
+        # Grade display with emoji
+        grade_emoji = {
+            'A': '🌟 Grade A - Sempurna!',
+            'B': '👍 Grade B - Bagus!',
+            'C': '😊 Grade C - Cukup Baik',
+            'D': '🤔 Grade D - Perlu Latihan',
+            'E': '💪 Grade E - Terus Berlatih!'
+        }
+        grade_display = f"# {grade_emoji.get(result.grade, '📊 Grade ' + result.grade)}\n## Score: {result.overall_score}/100"
+        return score_display, feedback_display, details, grade_display
+    except Exception as e:
+        error_msg = f"❌ Error: {str(e)}"
+        return error_msg, error_msg, {"error": str(e)}, "❌ Error"
+# =======================================
+# GRADIO UI
+# =======================================
+def create_interface():
+    """Create Gradio interface"""
+    # Custom CSS
+    custom_css = """
+    .gradio-container {
+        font-family: 'Segoe UI', Arial, sans-serif;
+    }
+    .grade-display {
+        text-align: center;
+        padding: 20px;
+        border-radius: 10px;
+        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+        color: white;
+    }
+    """
+    # Create interface
+    with gr.Blocks(
+        title="Vocal Articulation Assessment v2",
+        theme=gr.themes.Soft(primary_hue="purple"),
+        css=custom_css
+    ) as demo:
+        gr.Markdown("""
+        # 🎤 Sistem Penilaian Vokal Indonesia v2.0
+        ### Powered by Whisper ASR + Advanced Audio Analysis
+        Sistem ini menilai pengucapan vokal dan artikulasi bahasa Indonesia dengan **6 metrik komprehensif**:
+        | Metrik | Deskripsi |
+        |--------|-----------|
+        | 🔊 **Clarity** | Kejelasan pengucapan dari ASR accuracy (Whisper) |
+        | ⚡ **Energy** | Kualitas volume dan energi suara |
+        | 🗣️ **Speech Rate** | Kecepatan bicara (suku kata per detik) |
+        | 🎵 **Pitch Consistency** | Stabilitas nada suara |
+        | 📡 **SNR** | Signal-to-Noise Ratio (kualitas rekaman) |
+        | 🎤 **Articulation** | Kejernihan artikulasi dari analisis spektral |
+        ---
+        ### 📚 5 Level Latihan Artikulasi:
+        """)
+        # Display levels
+        for level_num, level_data in ARTICULATION_LEVELS.items():
+            with gr.Accordion(f"Level {level_num}: {level_data['name']} - {level_data['difficulty']}", open=False):
+                targets_display = ", ".join(level_data['targets'][:10])
+                if len(level_data['targets']) > 10:
+                    targets_display += f"... dan {len(level_data['targets']) - 10} lainnya"
+                gr.Markdown(f"**Contoh target**: {targets_display}")
+        gr.Markdown("---")
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown("## 🎙️ Input Audio & Target")
+                # Audio input
+                audio_input = gr.Audio(
+                    label="Upload atau Record Audio",
+                    type="filepath",
+                    sources=["upload", "microphone"]
+                )
+                # Target text input
+                target_input = gr.Textbox(
+                    label="Target Text",
+                    placeholder="Masukkan text yang Anda ucapkan (misal: A, BA, PSIKOLOGI, dll)",
+                    info="Masukkan text sesuai level yang dipilih"
+                )
+                # Level selector
+                level_input = gr.Slider(
+                    label="Level Artikulasi",
+                    minimum=1,
+                    maximum=5,
+                    value=1,
+                    step=1,
+                    info="1=Vokal, 2=Konsonan, 3=Suku Kata, 4=Kata, 5=Kalimat"
+                )
+                # Submit button
+                submit_btn = gr.Button(
+                    "🎯 Nilai Pengucapan",
+                    variant="primary",
+                    size="lg"
+                )
+                # Examples
+                gr.Markdown("### 📝 Contoh Quick Test")
+                gr.Examples(
+                    examples=[
+                        [None, "A", 1],
+                        [None, "I", 1],
+                        [None, "U", 1],
+                        [None, "BA", 2],
+                        [None, "STRATEGI", 4],
+                    ],
+                    inputs=[audio_input, target_input, level_input],
+                    label="Klik untuk auto-fill (masih perlu audio)"
+                )
+            with gr.Column(scale=1):
+                gr.Markdown("## 📊 Hasil & Grade")
+                # Grade display (large)
+                grade_output = gr.Markdown(
+                    "### 🎯 Upload audio untuk mulai penilaian",
+                    elem_classes=["grade-display"]
+                )
+                # Score display
+                score_output = gr.Markdown()
+        # Feedback row
+        gr.Markdown("---")
+        with gr.Row():
+            feedback_output = gr.Markdown()
+        # Details accordion
+        with gr.Accordion("🔍 Detail Lengkap & Audio Features", open=False):
+            details_output = gr.JSON(label="Detailed Metrics & Features")
+        # Connect button to function
+        submit_btn.click(
+            fn=score_vocal,
+            inputs=[audio_input, target_input, level_input],
+            outputs=[score_output, feedback_output, details_output, grade_output]
+        )
+        # Footer
+        gr.Markdown("""
+        ---
+        ### ℹ️ Informasi Sistem
+        **Grading System:**
+        - **Grade A** (90-100): 🌟 Sempurna - pengucapan sangat jelas dan akurat
+        - **Grade B** (80-89): 👍 Bagus - pengucapan cukup jelas dengan minor errors
+        - **Grade C** (70-79): 😊 Cukup - ada beberapa kesalahan
+        - **Grade D** (60-69): 🤔 Kurang - perlu latihan lebih
+        - **Grade E** (<60): 💪 Terus berlatih!
+        **Model**: OpenAI Whisper (multilingual ASR) + Advanced Audio Signal Processing
+        **Dibuat untuk**: Latihan Dasar Artikulasi Vokal Indonesia (Level 1-5)
+        **Version**: 2.0.0 | **Updated**: November 2025
+        """)
+    return demo
+# =======================================
+# MAIN
+# =======================================
+if __name__ == "__main__":
+    # Initialize model at startup
+    print("🔄 Initializing system...")
+    initialize_model()
+    print("✅ System ready!")
+    # Create and launch interface
+    demo = create_interface()
+    # Launch configuration
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False,  # Set True untuk public URL
+        show_error=True
+    )

config/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+# Config module
+from .settings import get_settings
+__all__ = ['get_settings']

config/settings.py ADDED Viewed

	@@ -0,0 +1,30 @@

+"""
+Application settings and configuration
+"""
+import os
+from functools import lru_cache
+class Settings:
+    """Application configuration"""
+    # Model settings
+    WHISPER_MODEL: str = os.getenv("WHISPER_MODEL", "openai/whisper-small")
+    # Server settings
+    HOST: str = os.getenv("HOST", "0.0.0.0")
+    PORT: int = int(os.getenv("PORT", "7860"))
+    # Gradio settings
+    GRADIO_SERVER_NAME: str = os.getenv("GRADIO_SERVER_NAME", "0.0.0.0")
+    GRADIO_SERVER_PORT: int = int(os.getenv("GRADIO_SERVER_PORT", "7860"))
+    GRADIO_SHARE: bool = os.getenv("GRADIO_SHARE", "False").lower() == "true"
+    # Application settings
+    APP_NAME: str = "Vocal Articulation Assessment"
+    VERSION: str = "2.0.0"
+    DEBUG: bool = os.getenv("DEBUG", "False").lower() == "true"
+@lru_cache()
+def get_settings() -> Settings:
+    """Get cached settings instance"""
+    return Settings()

core/__init__.py ADDED Viewed

	@@ -0,0 +1,9 @@

+# Core module for vocal articulation scoring
+from .scoring_engine import AdvancedVocalScoringSystem, ScoreResult
+from .constants import ARTICULATION_LEVELS
+__all__ = [
+    'AdvancedVocalScoringSystem',
+    'ScoreResult',
+    'ARTICULATION_LEVELS'
+]

core/constants.py ADDED Viewed

	@@ -0,0 +1,94 @@

+# =======================================
+# CONSTANTS - Articulation Levels
+# =======================================
+ARTICULATION_LEVELS = {
+    1: {
+        "name": "Pengenalan Vokal",
+        "targets": ["A", "I", "U", "E", "O"],
+        "difficulty": "Sangat Mudah",
+        "speech_rate_range": (0.5, 2.0)
+    },
+    2: {
+        "name": "Konsonan Dasar",
+        "targets": ["BA", "PA", "DA", "TA", "GA", "KA", "FA", "VA", "SA", "ZA",
+                   "MA", "NA", "NGA", "NYA", "RA", "LA"],
+        "difficulty": "Mudah",
+        "speech_rate_range": (2.0, 4.0)
+    },
+    3: {
+        "name": "Kombinasi Suku Kata",
+        "targets": ["BA", "BE", "BI", "BO", "BU", "TA", "TI", "TU", "TE", "TO",
+                   "KA", "KI", "KU", "KE", "KO", "RA", "RI", "RU", "RE", "RO",
+                   "LA", "LI", "LU", "LE", "LO", "CHA", "CHI", "CHU", "CHE", "CHO",
+                   "STRA", "STRI", "STRU", "STRE", "STRO", "AK", "IK", "UK", "EK", "OK"],
+        "difficulty": "Sedang",
+        "speech_rate_range": (2.5, 5.0)
+    },
+    4: {
+        "name": "Kata Sulit",
+        "targets": ["PSIKOLOGI", "STRATEGI", "IMPLEMENTASI", "INFRASTRUKTUR",
+                   "KHARISMATIK", "TRANSKRIPSI", "OTORITER", "PROBABILITAS",
+                   "KUALITAS", "SPESIFIKASI"],
+        "difficulty": "Sulit",
+        "speech_rate_range": (2.0, 4.5)
+    },
+    5: {
+        "name": "Kalimat Kompleks",
+        "targets": [
+            "ULAR LARI LURUS DI ATAS REL LURUS",
+            "KUKU KAKI KAKEK KAKAKKU KAKU DAN KOTOR",
+            "SATU SATE TUJUH TUSUK DUA SATE EMPAT BELAS TUSUK",
+            "KEPALA DIPARUT KELAPA DIGARUK JANGAN SAMPAI TERTUKAR",
+            "PSIKOLOGI MEMPELAJARI PROSES PROSES PSIKIS SECARA SPESIFIK",
+            "STRATEGI IMPLEMENTASI INFRASTRUKTUR TRANSISIONAL HARUS JELAS",
+            "KLAIM KLAIM KLIMAKS KLASIK KELOMPOK KITA KIAN KRITIS"
+        ],
+        "difficulty": "Sangat Sulit",
+        "speech_rate_range": (2.5, 4.5)
+    }
+}
+# Scoring weights per level
+LEVEL_WEIGHTS = {
+    1: {  # Vokal tunggal
+        'clarity': 0.45,
+        'energy': 0.25,
+        'speech_rate': 0.0,
+        'pitch_consistency': 0.15,
+        'snr': 0.10,
+        'articulation': 0.05
+    },
+    2: {  # Konsonan dasar
+        'clarity': 0.40,
+        'energy': 0.20,
+        'speech_rate': 0.15,
+        'pitch_consistency': 0.10,
+        'snr': 0.10,
+        'articulation': 0.05
+    },
+    3: {  # Kombinasi suku kata
+        'clarity': 0.40,
+        'energy': 0.15,
+        'speech_rate': 0.20,
+        'pitch_consistency': 0.10,
+        'snr': 0.10,
+        'articulation': 0.05
+    },
+    4: {  # Kata sulit
+        'clarity': 0.45,
+        'energy': 0.15,
+        'speech_rate': 0.15,
+        'pitch_consistency': 0.10,
+        'snr': 0.10,
+        'articulation': 0.05
+    },
+    5: {  # Kalimat kompleks
+        'clarity': 0.45,
+        'energy': 0.10,
+        'speech_rate': 0.20,
+        'pitch_consistency': 0.10,
+        'snr': 0.10,
+        'articulation': 0.05
+    }
+}

core/scoring_engine.py ADDED Viewed

	@@ -0,0 +1,638 @@

+# =======================================
+# ADVANCED VOCAL SCORING SYSTEM
+# ASR-based dengan Whisper + Audio Analysis
+# Support Level 1-5 Artikulasi
+# =======================================
+import torch
+import torchaudio
+import numpy as np
+import librosa
+from transformers import (
+    WhisperProcessor,
+    WhisperForConditionalGeneration,
+    pipeline
+)
+from typing import Dict, List, Tuple, Optional
+from dataclasses import dataclass
+import difflib
+import re
+from .constants import ARTICULATION_LEVELS, LEVEL_WEIGHTS
+# =======================================
+# SCORE RESULT DATACLASS
+# =======================================
+@dataclass
+class ScoreResult:
+    """Comprehensive scoring result"""
+    # Overall
+    overall_score: float  # 0-100
+    grade: str  # A-E
+    # Component scores
+    clarity_score: float  # ASR accuracy (0-100)
+    energy_score: float  # Volume quality (0-100)
+    speech_rate_score: float  # Speech rate (0-100)
+    pitch_consistency_score: float  # Pitch stability (0-100)
+    snr_score: float  # Signal-to-noise ratio (0-100)
+    articulation_score: float  # Articulation clarity (0-100)
+    # ASR results
+    transcription: str
+    target: str
+    similarity: float  # 0-1
+    wer: float  # Word Error Rate
+    # Audio features
+    audio_features: Dict
+    # Feedback
+    feedback: str
+    suggestions: List[str]
+    level: int
+# =======================================
+# ADVANCED SCORING SYSTEM
+# =======================================
+class AdvancedVocalScoringSystem:
+    """
+    Sistem penilaian vokal dengan ASR (Whisper) + Audio Analysis
+    Support Level 1-5
+    """
+    def __init__(
+        self,
+        whisper_model: str = "openai/whisper-small",  # atau "openai/whisper-large-v3"
+        device: str = None
+    ):
+        """
+        Initialize system dengan Whisper ASR
+        Args:
+            whisper_model: Model Whisper yang digunakan
+            device: 'cuda' atau 'cpu'
+        """
+        self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
+        print(f"🔄 Loading Whisper model: {whisper_model}...")
+        # Load Whisper model
+        self.processor = WhisperProcessor.from_pretrained(whisper_model)
+        self.model = WhisperForConditionalGeneration.from_pretrained(whisper_model)
+        self.model.to(self.device)
+        self.model.eval()
+        # Whisper pipeline untuk transcription
+        self.pipe = pipeline(
+            "automatic-speech-recognition",
+            model=whisper_model,
+            device=0 if self.device == "cuda" else -1
+        )
+        print(f"✅ Whisper model loaded on {self.device}")
+        # Scoring weights untuk setiap level
+        self.level_weights = LEVEL_WEIGHTS
+    def score_audio(
+        self,
+        audio_path: str,
+        target_text: str,
+        level: int = 1
+    ) -> ScoreResult:
+        """
+        Score audio file dengan comprehensive metrics
+        Args:
+            audio_path: Path ke audio file
+            target_text: Target text yang seharusnya diucapkan
+            level: Level artikulasi (1-5)
+        Returns:
+            ScoreResult dengan semua metrik
+        """
+        # Load audio
+        waveform, sr = torchaudio.load(audio_path)
+        # Convert to numpy for librosa
+        audio_np = waveform.numpy()
+        if audio_np.ndim > 1:
+            audio_np = audio_np[0]
+        # 1. CLARITY SCORE (ASR-based)
+        clarity_score, transcription, similarity, wer = self._score_clarity(
+            audio_path, target_text
+        )
+        # 2. ENERGY SCORE
+        energy_score = self._score_energy(audio_np, sr)
+        # 3. SPEECH RATE SCORE
+        speech_rate_score = self._score_speech_rate(
+            audio_np, sr, target_text, level
+        )
+        # 4. PITCH CONSISTENCY SCORE
+        pitch_consistency_score = self._score_pitch_consistency(audio_np, sr)
+        # 5. SNR SCORE (Signal-to-Noise Ratio)
+        snr_score = self._score_snr(audio_np, sr)
+        # 6. ARTICULATION SCORE
+        articulation_score = self._score_articulation(audio_np, sr)
+        # Extract audio features
+        audio_features = self._extract_audio_features(audio_np, sr, transcription)
+        # Calculate overall score with level-specific weights
+        weights = self.level_weights.get(level, self.level_weights[1])
+        overall_score = (
+            clarity_score * weights['clarity'] +
+            energy_score * weights['energy'] +
+            speech_rate_score * weights['speech_rate'] +
+            pitch_consistency_score * weights['pitch_consistency'] +
+            snr_score * weights['snr'] +
+            articulation_score * weights['articulation']
+        )
+        # Determine grade
+        grade = self._get_grade(overall_score)
+        # Generate feedback
+        feedback, suggestions = self._generate_feedback(
+            overall_score=overall_score,
+            clarity_score=clarity_score,
+            energy_score=energy_score,
+            speech_rate_score=speech_rate_score,
+            pitch_consistency_score=pitch_consistency_score,
+            snr_score=snr_score,
+            articulation_score=articulation_score,
+            transcription=transcription,
+            target_text=target_text,
+            similarity=similarity,
+            level=level,
+            audio_features=audio_features
+        )
+        return ScoreResult(
+            overall_score=round(overall_score, 2),
+            grade=grade,
+            clarity_score=round(clarity_score, 2),
+            energy_score=round(energy_score, 2),
+            speech_rate_score=round(speech_rate_score, 2),
+            pitch_consistency_score=round(pitch_consistency_score, 2),
+            snr_score=round(snr_score, 2),
+            articulation_score=round(articulation_score, 2),
+            transcription=transcription,
+            target=target_text.upper(),
+            similarity=round(similarity, 4),
+            wer=round(wer, 4),
+            audio_features=audio_features,
+            feedback=feedback,
+            suggestions=suggestions,
+            level=level
+        )
+    # =======================================
+    # SCORING COMPONENTS
+    # =======================================
+    def _score_clarity(
+        self,
+        audio_path: str,
+        target_text: str
+    ) -> Tuple[float, str, float, float]:
+        """
+        Score clarity using Whisper ASR
+        Returns:
+            (clarity_score, transcription, similarity, wer)
+        """
+        try:
+            # Transcribe with Whisper
+            result = self.pipe(
+                audio_path,
+                return_timestamps=False,
+                generate_kwargs={"language": "indonesian"}
+            )
+            transcription = result["text"].upper().strip()
+        except Exception as e:
+            print(f"⚠️ ASR Error: {e}")
+            transcription = ""
+        target_text = target_text.upper().strip()
+        # Calculate similarity
+        similarity = difflib.SequenceMatcher(None, transcription, target_text).ratio()
+        # Calculate WER
+        wer = self._calculate_wer(transcription, target_text)
+        # Clarity score based on similarity and WER
+        clarity_score = (similarity * 0.7 + (1 - wer) * 0.3) * 100
+        return clarity_score, transcription, similarity, wer
+    def _score_energy(self, audio: np.ndarray, sr: int) -> float:
+        """
+        Score energy/volume quality
+        Returns:
+            energy_score (0-100)
+        """
+        # RMS energy
+        rms = np.sqrt(np.mean(audio**2))
+        rms_db = 20 * np.log10(rms + 1e-10)
+        # Optimal range: -30 to -10 dB
+        if -30 <= rms_db <= -10:
+            energy_score = 100
+        elif -40 <= rms_db < -30:
+            energy_score = 60 + (rms_db + 40) * 4
+        elif -10 < rms_db <= -5:
+            energy_score = 100 - (rms_db + 10) * 8
+        elif rms_db < -40:
+            energy_score = max(0, 60 + (rms_db + 40) * 4)
+        else:
+            energy_score = max(0, 60 - (rms_db + 5) * 5)
+        return min(100, max(0, energy_score))
+    def _score_speech_rate(
+        self,
+        audio: np.ndarray,
+        sr: int,
+        target_text: str,
+        level: int
+    ) -> float:
+        """
+        Score speech rate (syllable per second)
+        Returns:
+            speech_rate_score (0-100)
+        """
+        # Duration
+        duration = len(audio) / sr
+        # Count syllables in target
+        syllable_count = self._count_syllables(target_text)
+        if duration <= 0 or syllable_count == 0:
+            return 50  # neutral score
+        # Calculate speech rate
+        speech_rate = syllable_count / duration
+        # Get optimal speech rate from level configuration
+        level_config = ARTICULATION_LEVELS.get(level, ARTICULATION_LEVELS[1])
+        min_rate, max_rate = level_config.get('speech_rate_range', (2.0, 4.0))
+        optimal_mid = (min_rate + max_rate) / 2
+        # Score based on deviation from optimal
+        if min_rate <= speech_rate <= max_rate:
+            # Within optimal range
+            deviation = abs(speech_rate - optimal_mid) / (max_rate - min_rate)
+            speech_rate_score = 100 - (deviation * 20)
+        else:
+            # Outside optimal range
+            if speech_rate < min_rate:
+                deviation = (min_rate - speech_rate) / min_rate
+            else:
+                deviation = (speech_rate - max_rate) / max_rate
+            speech_rate_score = max(0, 80 - (deviation * 80))
+        return min(100, max(0, speech_rate_score))
+    def _score_pitch_consistency(self, audio: np.ndarray, sr: int) -> float:
+        """
+        Score pitch consistency/stability
+        Returns:
+            pitch_score (0-100)
+        """
+        try:
+            # Extract pitch using librosa
+            pitches, magnitudes = librosa.piptrack(
+                y=audio,
+                sr=sr,
+                fmin=80,
+                fmax=400
+            )
+            # Get pitch values
+            pitch_values = []
+            for t in range(pitches.shape[1]):
+                index = magnitudes[:, t].argmax()
+                pitch = pitches[index, t]
+                if pitch > 0:
+                    pitch_values.append(pitch)
+            if len(pitch_values) < 5:
+                return 50  # not enough data
+            # Calculate coefficient of variation
+            pitch_std = np.std(pitch_values)
+            pitch_mean = np.mean(pitch_values)
+            cv = pitch_std / pitch_mean if pitch_mean > 0 else 1
+            # Score based on CV
+            if cv < 0.1:
+                pitch_score = 95 + (0.1 - cv) * 50
+            elif cv < 0.2:
+                pitch_score = 80 + (0.2 - cv) * 150
+            elif cv < 0.3:
+                pitch_score = 60 + (0.3 - cv) * 200
+            else:
+                pitch_score = max(0, 60 - (cv - 0.3) * 100)
+            return min(100, max(0, pitch_score))
+        except Exception as e:
+            return 50  # neutral on error
+    def _score_snr(self, audio: np.ndarray, sr: int) -> float:
+        """
+        Score Signal-to-Noise Ratio
+        Returns:
+            snr_score (0-100)
+        """
+        try:
+            # Simple SNR estimation
+            # Assume first and last 10% are potential noise
+            noise_samples = int(len(audio) * 0.1)
+            if len(audio) < noise_samples * 3:
+                return 50  # audio too short
+            noise = np.concatenate([audio[:noise_samples], audio[-noise_samples:]])
+            signal = audio[noise_samples:-noise_samples]
+            # Calculate power
+            signal_power = np.mean(signal**2)
+            noise_power = np.mean(noise**2)
+            if noise_power == 0:
+                return 100  # perfect
+            snr = 10 * np.log10(signal_power / noise_power)
+            # Score based on SNR
+            # Good SNR: > 20 dB
+            # Acceptable: 10-20 dB
+            # Poor: < 10 dB
+            if snr >= 25:
+                snr_score = 100
+            elif snr >= 15:
+                snr_score = 80 + (snr - 15) * 2
+            elif snr >= 10:
+                snr_score = 60 + (snr - 10) * 4
+            elif snr >= 5:
+                snr_score = 40 + (snr - 5) * 4
+            else:
+                snr_score = max(0, snr * 8)
+            return min(100, max(0, snr_score))
+        except Exception as e:
+            return 50  # neutral on error
+    def _score_articulation(self, audio: np.ndarray, sr: int) -> float:
+        """
+        Score articulation clarity using spectral features
+        Returns:
+            articulation_score (0-100)
+        """
+        try:
+            # Zero Crossing Rate (higher = more clarity)
+            zcr = librosa.zero_crossings(audio).sum() / len(audio)
+            # Spectral centroid (brightness)
+            spectral_centroid = librosa.feature.spectral_centroid(y=audio, sr=sr)[0]
+            spectral_centroid_mean = spectral_centroid.mean()
+            # Spectral rolloff
+            spectral_rolloff = librosa.feature.spectral_rolloff(y=audio, sr=sr)[0]
+            spectral_rolloff_mean = spectral_rolloff.mean()
+            # Normalize and score
+            # Good articulation: ZCR 0.1-0.3, Centroid 1000-3000 Hz
+            # ZCR score
+            if 0.1 <= zcr <= 0.3:
+                zcr_score = 100
+            elif zcr < 0.1:
+                zcr_score = zcr * 1000
+            else:
+                zcr_score = max(0, 100 - (zcr - 0.3) * 200)
+            # Centroid score
+            if 1000 <= spectral_centroid_mean <= 3000:
+                centroid_score = 100
+            elif spectral_centroid_mean < 1000:
+                centroid_score = (spectral_centroid_mean / 1000) * 100
+            else:
+                centroid_score = max(0, 100 - ((spectral_centroid_mean - 3000) / 3000) * 100)
+            # Combined score
+            articulation_score = (zcr_score * 0.4 + centroid_score * 0.6)
+            return min(100, max(0, articulation_score))
+        except Exception as e:
+            return 50  # neutral on error
+    # =======================================
+    # HELPER FUNCTIONS
+    # =======================================
+    def _calculate_wer(self, predicted: str, target: str) -> float:
+        """Calculate Word Error Rate"""
+        pred_words = predicted.split()
+        target_words = target.split()
+        if not target_words:
+            return 1.0 if pred_words else 0.0
+        # Levenshtein distance
+        d = np.zeros((len(pred_words) + 1, len(target_words) + 1))
+        for i in range(len(pred_words) + 1):
+            d[i][0] = i
+        for j in range(len(target_words) + 1):
+            d[0][j] = j
+        for i in range(1, len(pred_words) + 1):
+            for j in range(1, len(target_words) + 1):
+                if pred_words[i-1] == target_words[j-1]:
+                    d[i][j] = d[i-1][j-1]
+                else:
+                    d[i][j] = min(d[i-1][j], d[i][j-1], d[i-1][j-1]) + 1
+        return d[len(pred_words)][len(target_words)] / len(target_words)
+    def _count_syllables(self, text: str) -> int:
+        """
+        Count syllables in Indonesian text
+        Simplified: count vowels
+        """
+        text = text.upper()
+        vowels = "AIUEO"
+        count = 0
+        prev_was_vowel = False
+        for char in text:
+            is_vowel = char in vowels
+            if is_vowel and not prev_was_vowel:
+                count += 1
+            prev_was_vowel = is_vowel
+        return max(1, count)
+    def _extract_audio_features(
+        self,
+        audio: np.ndarray,
+        sr: int,
+        transcription: str
+    ) -> Dict:
+        """Extract comprehensive audio features"""
+        try:
+            duration = len(audio) / sr
+            rms = np.sqrt(np.mean(audio**2))
+            rms_db = 20 * np.log10(rms + 1e-10)
+            zcr = librosa.zero_crossings(audio).sum() / len(audio)
+            # Spectral features
+            spectral_centroid = librosa.feature.spectral_centroid(y=audio, sr=sr)[0].mean()
+            spectral_rolloff = librosa.feature.spectral_rolloff(y=audio, sr=sr)[0].mean()
+            spectral_bandwidth = librosa.feature.spectral_bandwidth(y=audio, sr=sr)[0].mean()
+            # Tempo
+            tempo, _ = librosa.beat.beat_track(y=audio, sr=sr)
+            return {
+                'duration': round(duration, 3),
+                'rms_db': round(rms_db, 2),
+                'zero_crossing_rate': round(zcr, 4),
+                'spectral_centroid': round(float(spectral_centroid), 2),
+                'spectral_rolloff': round(float(spectral_rolloff), 2),
+                'spectral_bandwidth': round(float(spectral_bandwidth), 2),
+                'tempo': round(float(tempo), 2),
+                'transcription': transcription
+            }
+        except Exception as e:
+            return {
+                'duration': len(audio) / sr,
+                'error': str(e)
+            }
+    def _get_grade(self, score: float) -> str:
+        """Convert score to grade"""
+        if score >= 90:
+            return 'A'
+        elif score >= 80:
+            return 'B'
+        elif score >= 70:
+            return 'C'
+        elif score >= 60:
+            return 'D'
+        else:
+            return 'E'
+    def _generate_feedback(
+        self,
+        overall_score: float,
+        clarity_score: float,
+        energy_score: float,
+        speech_rate_score: float,
+        pitch_consistency_score: float,
+        snr_score: float,
+        articulation_score: float,
+        transcription: str,
+        target_text: str,
+        similarity: float,
+        level: int,
+        audio_features: Dict
+    ) -> Tuple[str, List[str]]:
+        """Generate detailed feedback"""
+        feedback_parts = []
+        suggestions = []
+        # Overall feedback
+        if overall_score >= 90:
+            feedback_parts.append("🌟 Sempurna! Pengucapan Anda sangat baik.")
+        elif overall_score >= 80:
+            feedback_parts.append("👍 Bagus! Pengucapan sudah cukup jelas.")
+        elif overall_score >= 70:
+            feedback_parts.append("😊 Cukup baik, masih bisa ditingkatkan.")
+        elif overall_score >= 60:
+            feedback_parts.append("🤔 Perlu latihan lebih.")
+        else:
+            feedback_parts.append("💪 Terus berlatih!")
+        # Transcription match
+        if similarity < 0.8:
+            feedback_parts.append(f"\n❌ Target: '{target_text}', Terdeteksi: '{transcription}'")
+            suggestions.append(f"Fokus pada pengucapan '{target_text}' yang lebih jelas")
+        elif similarity < 1.0:
+            feedback_parts.append(f"\n⚠️ Hampir benar! Target: '{target_text}', Terdeteksi: '{transcription}'")
+        else:
+            feedback_parts.append(f"\n✅ Pengucapan '{target_text}' terdeteksi dengan sempurna!")
+        # Component-specific feedback
+        if clarity_score < 70:
+            suggestions.append("Ucapkan setiap huruf/kata dengan lebih jelas dan artikulasi yang baik")
+        if energy_score < 70:
+            if audio_features.get('rms_db', 0) < -35:
+                suggestions.append("Volume terlalu rendah - bicaralah lebih keras")
+            elif audio_features.get('rms_db', 0) > -8:
+                suggestions.append("Volume terlalu tinggi - bicaralah lebih lembut")
+        if speech_rate_score < 70 and level > 1:
+            if audio_features.get('duration', 0) > 2.0:
+                suggestions.append("Terlalu lambat - ucapkan dengan kecepatan yang lebih natural")
+            else:
+                suggestions.append("Terlalu cepat - ucapkan lebih pelan dan jelas")
+        if pitch_consistency_score < 70:
+            suggestions.append("Pertahankan nada suara yang lebih stabil")
+        if snr_score < 70:
+            suggestions.append("Rekam di tempat yang lebih tenang (kurangi noise latar belakang)")
+        if articulation_score < 70:
+            suggestions.append("Perbaiki artikulasi dengan membuka mulut lebih lebar")
+        feedback = " ".join(feedback_parts)
+        return feedback, suggestions
+# =======================================
+# USAGE EXAMPLE
+# =======================================
+if __name__ == "__main__":
+    print("="*70)
+    print("🎯 ADVANCED VOCAL SCORING SYSTEM")
+    print("   ASR-based (Whisper) + Comprehensive Audio Analysis")
+    print("="*70)
+    # Initialize
+    scorer = AdvancedVocalScoringSystem(
+        whisper_model="openai/whisper-small"  # atau "openai/whisper-large-v3"
+    )
+    print("\n✅ System ready!")
+    print("\nSupported levels:")
+    for level_num, level_data in ARTICULATION_LEVELS.items():
+        print(f"  Level {level_num}: {level_data['name']} ({level_data['difficulty']})")

requirements.txt ADDED Viewed

	@@ -0,0 +1,38 @@

+# Vocal Articulation Assessment System - Updated
+# Requirements untuk Whisper ASR + Audio Analysis
+# Core ML Libraries
+torch>=2.0.0
+torchaudio>=2.0.0
+transformers>=4.35.0
+# Audio Processing
+librosa>=0.10.0
+soundfile>=0.12.0
+audioread>=3.0.0
+# Whisper dependencies
+openai-whisper>=20231117  # Optional: untuk whisper standalone
+accelerate>=0.20.0  # untuk faster inference
+# Web Framework & API
+fastapi>=0.104.0
+uvicorn[standard]>=0.24.0
+python-multipart>=0.0.6
+# Gradio for UI
+gradio>=4.0.0
+# HuggingFace Spaces ZeroGPU (optional, only for HF Spaces)
+# spaces>=0.1.0
+# Utilities
+numpy>=1.24.0
+scipy>=1.11.0
+pydantic>=2.0.0
+python-Levenshtein>=0.21.0  # faster string similarity
+# Development & Testing
+pytest>=7.4.0
+black>=23.0.0
+flake8>=6.0.0

start.sh ADDED Viewed

	@@ -0,0 +1,26 @@

+#!/bin/bash
+# =======================================
+# START SCRIPT - Vocal Articulation System
+# For Docker / HuggingFace Spaces deployment
+# =======================================
+echo "🚀 Starting Vocal Articulation Assessment System..."
+# Check Python version
+python --version
+# Install dependencies if needed
+if [ ! -d ".venv" ]; then
+    echo "📦 Installing dependencies..."
+    pip install -r requirements.txt
+fi
+# Set environment variables
+export PYTHONUNBUFFERED=1
+export GRADIO_SERVER_NAME="0.0.0.0"
+export GRADIO_SERVER_PORT=7860
+# Start application
+echo "✅ Starting Gradio interface..."
+python app.py