scam / PHASE_2_README.md
Gankit12's picture
Relative API URLs, docker-compose port fix, Phase 2 voice, HF deploy guide
6a4a552

Phase 2: Voice Implementation - Quick Start Guide

What is Phase 2?

Phase 2 adds live two-way voice conversation to the ScamShield AI honeypot:

  • You speak (as scammer) β†’ AI transcribes β†’ processes β†’ AI speaks back
  • Completely isolated from Phase 1 (text honeypot)
  • Optional feature (enabled via PHASE_2_ENABLED=true)

Architecture

Voice Input (You) β†’ ASR (Whisper) β†’ Text
                                      ↓
                              Phase 1 Honeypot (Unchanged)
                                      ↓
Voice Output (AI) ← TTS (gTTS) ← Text Reply

Key Point: Phase 1 text honeypot is not modified. Voice is just input/output wrapper.

Quick Setup

1. Install Dependencies

# Install Phase 2 dependencies
pip install -r requirements-phase2.txt

# Note: PyAudio may need system packages
# Windows: pip install pipwin && pipwin install pyaudio
# Linux: sudo apt-get install portaudio19-dev
# Mac: brew install portaudio

2. Configure Environment

# Add to your .env file
PHASE_2_ENABLED=true
WHISPER_MODEL=base
TTS_ENGINE=gtts
VOICE_FRAUD_DETECTION=false

3. Start Server

# Start FastAPI server (same as Phase 1)
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

4. Open Voice UI

Open in browser: http://localhost:8000/ui/voice.html

Testing the Voice Feature

Option 1: Record Live

  1. Click "Start Recording"
  2. Speak as a scammer (e.g., "Your account is blocked. Send OTP immediately.")
  3. Click "Stop Recording"
  4. Wait for AI to:
    • Transcribe your voice
    • Process through honeypot
    • Reply with voice

Option 2: Upload Audio File

  1. Click "Upload Audio File"
  2. Select a .wav, .mp3, or .m4a file
  3. AI processes and replies

API Endpoint

POST /api/v1/voice/engage

Request:

curl -X POST "http://localhost:8000/api/v1/voice/engage" \
  -H "x-api-key: dev-key-12345" \
  -F "audio_file=@recording.wav" \
  -F "session_id=voice-test-001" \
  -F "language=auto"

Response:

{
  "session_id": "voice-test-001",
  "scam_detected": true,
  "scam_confidence": 0.92,
  "scam_type": "financial_fraud",
  "turn_count": 1,
  "ai_reply_text": "Oh no! What should I do? Can you help me?",
  "ai_reply_audio_url": "/api/v1/voice/audio/reply_xyz.mp3",
  "transcription": {
    "text": "Your account is blocked. Send OTP immediately.",
    "language": "en",
    "confidence": 0.95
  },
  "voice_fraud": null,
  "extracted_intelligence": {
    "upi_ids": [],
    "bank_accounts": [],
    "phone_numbers": [],
    "urls": []
  },
  "processing_time_ms": 3450
}

File Structure

app/
β”œβ”€β”€ voice/                    # NEW: Phase 2 voice modules
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ asr.py               # Whisper ASR
β”‚   β”œβ”€β”€ tts.py               # gTTS text-to-speech
β”‚   └── fraud_detector.py    # Optional voice fraud detection
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ voice_endpoints.py   # NEW: Voice API endpoints
β”‚   └── voice_schemas.py     # NEW: Voice API schemas
└── ... (Phase 1 unchanged)

ui/
β”œβ”€β”€ voice.html               # NEW: Voice UI
β”œβ”€β”€ voice.js                 # NEW: Voice UI logic
β”œβ”€β”€ voice.css                # NEW: Voice UI styles
└── ... (Phase 1 unchanged)

PHASE_2_VOICE_IMPLEMENTATION_PLAN.md  # Full implementation plan
requirements-phase2.txt                # Phase 2 dependencies
.env.phase2.example                    # Phase 2 config example

Impact on Phase 1

ZERO IMPACT:

  • βœ… Phase 1 text honeypot unchanged
  • βœ… All existing tests pass
  • βœ… Existing API endpoints unchanged
  • βœ… Existing UI unchanged
  • βœ… Phase 2 is opt-in (disabled by default)

Performance

Metric Target Notes
ASR Latency <2s Whisper base model
TTS Latency <1s gTTS
Total Loop <5s Voice in β†’ Voice out
Accuracy >85% Transcription WER

Troubleshooting

"Voice API unavailable"

  • Check PHASE_2_ENABLED=true in .env
  • Verify dependencies installed: pip list | grep whisper
  • Check logs: tail -f logs/app.log

"Microphone access denied"

  • Browser needs microphone permission
  • Check browser settings β†’ Privacy β†’ Microphone
  • Use HTTPS or localhost (required for getUserMedia)

"PyAudio installation failed"

# Windows
pip install pipwin
pipwin install pyaudio

# Linux
sudo apt-get install portaudio19-dev python3-pyaudio
pip install pyaudio

# Mac
brew install portaudio
pip install pyaudio

"Whisper model download slow"

  • First run downloads model (~150MB for base)
  • Models cached in ~/.cache/whisper/
  • Use smaller model: WHISPER_MODEL=tiny

Advanced Features

Voice Fraud Detection (Optional)

Detect synthetic/deepfake voices:

# Enable in .env
VOICE_FRAUD_DETECTION=true

# Install additional dependency
pip install resemblyzer

Response includes:

"voice_fraud": {
  "is_synthetic": false,
  "confidence": 0.85,
  "risk_level": "low"
}

Custom TTS Voice

Future: Replace gTTS with IndicTTS for better Indic language support.

Streaming Audio

Future: Real-time audio streaming instead of record-then-send.

Testing Checklist

  • Install Phase 2 dependencies
  • Set PHASE_2_ENABLED=true
  • Start server
  • Open voice UI
  • Record voice message
  • Verify transcription
  • Verify AI reply (text)
  • Verify AI reply (audio)
  • Check metadata (language, confidence)
  • Verify Phase 1 tests still pass

Next Steps

  1. Review: Read PHASE_2_VOICE_IMPLEMENTATION_PLAN.md for full details
  2. Install: Run pip install -r requirements-phase2.txt
  3. Configure: Copy settings from .env.phase2.example to .env
  4. Test: Open ui/voice.html and try recording
  5. Deploy: Set PHASE_2_ENABLED=true in production

Support

  • Full plan: PHASE_2_VOICE_IMPLEMENTATION_PLAN.md
  • Issues: Check logs in logs/app.log
  • Questions: Review implementation plan sections

Phase 2 Status: βœ… Planned, 🚧 Ready to Implement

Estimated Implementation Time: 17-21 hours

Priority: Optional (Phase 1 is complete and sufficient for competition)