Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.0.0
title: Conversational Assessment for Responsive Engagement (CARE) Notes
emoji: 🐢
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: AI-driven conversational module for depression-triage
PHQ-9 Clinician Agent (Voice-first)
A lightweight research demo that simulates a clinician conducting a brief conversational PHQ-9 screening. The app is voice-first: you tap a circular mic bubble to talk; the model replies and can speak back via TTS. A separate Advanced tab exposes scoring and configuration.
What it does
- Conversational assessment to infer PHQ‑9 items from natural dialogue (no explicit questionnaire).
- Live inference of PHQ‑9 item scores, confidences, total score, and severity.
- Iterative light explainability after each turn to guide the next question (strong/weak/missing evidence by item).
- Final explainability at session end aggregating linguistic quotes and acoustic prosody.
- Self‑reflection step that checks consistency and may adjust low‑confidence item scores.
- Automatic stop when minimum confidence across items reaches a threshold or risk is detected.
- Optional TTS playback for clinician responses.
UI overview
- Main tab: Large circular mic “Record” bubble
- Tap to start, tap again to stop (processing runs on stop)
- While speaking back (TTS), the bubble shows a speaking state
- Chat tab: Plain chat transcript (for reviewing turns)
- Advanced tab:
- PHQ‑9 Assessment JSON (live)
- Severity label
- Confidence threshold slider (τ)
- Toggle: Speak clinician responses (TTS)
- Model ID textbox and “Apply model” button
Quick start (local)
- Python 3.10+ recommended.
- Install deps:
pip install -r requirements.txt - Run the app:
python app.py - Open the URL shown in the console (defaults to
http://0.0.0.0:7860). Allow microphone access in your browser.
Configuration
Environment variables (all optional):
LLM_MODEL_ID(defaultgoogle/gemma-2-2b-it): chat model idASR_MODEL_ID(defaultopenai/whisper-tiny.en): speech-to-text model idCONFIDENCE_THRESHOLD(default0.8): stop when min item confidence ≥ τMAX_TURNS(default12): hard stop capUSE_TTS(defaulttrue): enable TTS playbackMODEL_CONFIG_PATH(defaultmodel_config.json): persisted model idPORT(default7860): server port
Notes:
- If a GPU is available, the app will use it automatically for Transformers pipelines.
- Changing the model in Advanced will reload the text-generation pipeline on the next turn.
How to use
- Go to Main and tap the mic bubble. Speak naturally.
- Tap again to finish your turn. The model replies; if TTS is enabled, you’ll hear it.
- The Advanced tab updates live with PHQ‑9 scores and severity. Adjust the confidence threshold if you want the assessment to stop earlier/later.
Troubleshooting
- No mic input detected:
- Ensure the site has microphone permission in your browser settings.
- Try refreshing the page after granting permission.
- Can’t hear TTS:
- Enable the “Speak clinician responses (TTS)” toggle in Advanced.
- Ensure your system audio output is correct. Some browsers block auto‑play without interaction—use the mic once, then it should work.
- Model download slow or fails:
- Check internet connectivity and try again. Some models are large.
- Assessment doesn’t stop:
- Increase the confidence threshold slider (τ) in Advanced, or wait until the cap (
MAX_TURNS).
- Increase the confidence threshold slider (τ) in Advanced, or wait until the cap (
Safety
This demo does not provide therapy or emergency counseling. If a user expresses suicidal intent or risk is inferred, the app ends the conversation and advises contacting emergency services (e.g., 988 in the U.S.).
Architecture
RecordingAgent → ScoringAgent → ExplainabilityModule(light/full) → ReflectionModule → ReportGenerator
- RecordingAgent: generates clinician follow‑ups; guided by light explainability when available.
- ScoringAgent: infers PHQ‑9 item scores and per‑item confidences from transcript (+prosody summary).
- Explainability (light): keyword‑based evidence strength per item; selects next focus area.
- Explainability (full): aggregates transcript quotes and averaged prosody features into per‑item objects.
- Reflection: heuristic pass reduces scores by 1 for items with confidence < τ and missing evidence.
- ReportGenerator: patient and clinician summaries, confidence bars, highlights, and reflection notes.
Output objects
- Explainability (light):
{ "evidence_strength": {"appetite": "missing", ...}, "recommended_focus": "appetite", "quotes": {"appetite": ["..."], ...}, "confidences": {"appetite": 0.34, ...} } - Explainability (full):
{ "items": [ {"item":"appetite","confidence":0.42,"evidence":["..."],"prosody":["rms_mean=0.012", "zcr_mean=0.065", ...]} ], "notes": "Heuristic placeholder" } - Reflection report:
{ "corrected_scores": {"appetite": 1, ...}, "final_total": 12, "severity_label": "Moderate Depression", "consistency_score": 0.89, "notes": "Model revised appetite score due to low confidence and missing evidence." }
Development notes
- Framework: Gradio Blocks
- ASR: Transformers pipeline (Whisper)
- TTS: gTTS or Coqui TTS
- Prosody features: librosa proxies; replaceable by OpenSMILE
PRs and experiments are welcome. This is a research prototype and not a clinical tool.