Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
title: PlotWeaver Voice Agent
emoji: π£οΈ
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: true
short_description: Hausa voice AI for African banks, telecoms, and delivery
license: apache-2.0
PlotWeaver Voice Agent
Hausa-first conversational AI demo. Product 7 of the PlotWeaver suite: voice bots for WhatsApp, phone, and customer support across African banks, telecoms, and delivery services.
What it does
- ASR: Whisper-small transcribes your Hausa audio
- NLU: Hybrid three-tier system β rule-based keyword fast path β Qwen2.5-1.5B-Instruct zero-shot classifier for paraphrases β rule-based safety fallback. The pipeline trace shows which tier answered each turn.
- Dialogue manager: deterministic FSM across 3 verticals (Bank, Telecom, Delivery)
- TTS:
facebook/mms-tts-hausynthesizes the bot's Hausa response
How to use
- Pick a vertical (Bank / Telecom / Delivery)
- Three ways to talk to the agent:
- Type a Hausa phrase in the text box
- Record via browser microphone
- Upload a pre-recorded Hausa audio file (.wav, .mp3, .ogg β up to 30s)
- For audio, click "Transcribe & send" after recording/uploading
- Watch the pipeline trace on the left β session load, ASR, NLU, dialogue manager, TTS
- The bot's audio response autoplays; full multi-turn flows work (balance check, transfers, complaints, rescheduling, etc.)
Demo flows
Bank: "duba ma'auni" β "1234" β bot returns your balance.
Telecom: "saya airtime" β "1000" β airtime loaded.
Delivery: "bincika oda" β "10234" β order status.
Escalation: say "mutum" or "wakili" at any time to flag a human handoff.
Architecture
User (WhatsApp/Phone/Web)
β
ASR (Whisper) β NLU (XLM-R) β Dialogue FSM β Response Gen β TTS (MMS)
β β
Session state (Redis, 10min TTL) Bot audio
Notes
First turn takes 30-60s to cold-start ASR + TTS models (640MB total). The Qwen2.5-1.5B NLU model (~3GB) only loads when a user utterance doesn't match the rule-based keyword set β so common phrases stay fast, and novel phrasings trigger a 30-40s one-time LLM load (then ~5-8s per subsequent LLM call on CPU).
For production a GPU Space or dedicated endpoint brings full turn latency under 1s.
This is a POC demo. Production plan covers fine-tuned Hausa Whisper, fine-tuned XLM-R or AfroXLMR NLU classifier (replacing the LLM for consistent sub-100ms NLU), live WhatsApp Business Cloud integration, and Twilio Voice.