Spaces:
Sleeping
Sleeping
| title: Conversational Assessment for Responsive Engagement (CARE) Notes | |
| emoji: 🐢 | |
| colorFrom: indigo | |
| colorTo: gray | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| short_description: AI-driven conversational module for depression-triage | |
| # PHQ-9 Clinician Agent (Voice-first) | |
| A lightweight research demo that simulates a clinician conducting a brief conversational PHQ-9 screening. The app is voice-first: you tap a circular mic bubble to talk; the model replies and can speak back via TTS. A separate Advanced tab exposes scoring and configuration. | |
| ## What it does | |
| - Conversational assessment to infer PHQ‑9 items from natural dialogue (no explicit questionnaire). | |
| - Live inference of PHQ‑9 item scores, confidences, total score, and severity. | |
| - Iterative light explainability after each turn to guide the next question (strong/weak/missing evidence by item). | |
| - Final explainability at session end aggregating linguistic quotes and acoustic prosody. | |
| - Self‑reflection step that checks consistency and may adjust low‑confidence item scores. | |
| - Automatic stop when minimum confidence across items reaches a threshold or risk is detected. | |
| - Optional TTS playback for clinician responses. | |
| ## UI overview | |
| - Main tab: Large circular mic “Record” bubble | |
| - Tap to start, tap again to stop (processing runs on stop) | |
| - While speaking back (TTS), the bubble shows a speaking state | |
| - Chat tab: Plain chat transcript (for reviewing turns) | |
| - Advanced tab: | |
| - PHQ‑9 Assessment JSON (live) | |
| - Severity label | |
| - Confidence threshold slider (τ) | |
| - Toggle: Speak clinician responses (TTS) | |
| - Model ID textbox and “Apply model” button | |
| ## Quick start (local) | |
| 1. Python 3.10+ recommended. | |
| 2. Install deps: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Run the app: | |
| ```bash | |
| python app.py | |
| ``` | |
| 4. Open the URL shown in the console (defaults to `http://0.0.0.0:7860`). Allow microphone access in your browser. | |
| ## Configuration | |
| Environment variables (all optional): | |
| - `LLM_MODEL_ID` (default `google/gemma-2-2b-it`): chat model id | |
| - `ASR_MODEL_ID` (default `openai/whisper-tiny.en`): speech-to-text model id | |
| - `CONFIDENCE_THRESHOLD` (default `0.8`): stop when min item confidence ≥ τ | |
| - `MAX_TURNS` (default `12`): hard stop cap | |
| - `USE_TTS` (default `true`): enable TTS playback | |
| - `MODEL_CONFIG_PATH` (default `model_config.json`): persisted model id | |
| - `PORT` (default `7860`): server port | |
| Notes: | |
| - If a GPU is available, the app will use it automatically for Transformers pipelines. | |
| - Changing the model in Advanced will reload the text-generation pipeline on the next turn. | |
| ## How to use | |
| 1. Go to Main and tap the mic bubble. Speak naturally. | |
| 2. Tap again to finish your turn. The model replies; if TTS is enabled, you’ll hear it. | |
| 3. The Advanced tab updates live with PHQ‑9 scores and severity. Adjust the confidence threshold if you want the assessment to stop earlier/later. | |
| ## Troubleshooting | |
| - No mic input detected: | |
| - Ensure the site has microphone permission in your browser settings. | |
| - Try refreshing the page after granting permission. | |
| - Can’t hear TTS: | |
| - Enable the “Speak clinician responses (TTS)” toggle in Advanced. | |
| - Ensure your system audio output is correct. Some browsers block auto‑play without interaction—use the mic once, then it should work. | |
| - Model download slow or fails: | |
| - Check internet connectivity and try again. Some models are large. | |
| - Assessment doesn’t stop: | |
| - Increase the confidence threshold slider (τ) in Advanced, or wait until the cap (`MAX_TURNS`). | |
| ## Safety | |
| This demo does not provide therapy or emergency counseling. If a user expresses suicidal intent or risk is inferred, the app ends the conversation and advises contacting emergency services (e.g., 988 in the U.S.). | |
| ## Architecture | |
| RecordingAgent → ScoringAgent → ExplainabilityModule(light/full) → ReflectionModule → ReportGenerator | |
| - RecordingAgent: generates clinician follow‑ups; guided by light explainability when available. | |
| - ScoringAgent: infers PHQ‑9 item scores and per‑item confidences from transcript (+prosody summary). | |
| - Explainability (light): keyword‑based evidence strength per item; selects next focus area. | |
| - Explainability (full): aggregates transcript quotes and averaged prosody features into per‑item objects. | |
| - Reflection: heuristic pass reduces scores by 1 for items with confidence < τ and missing evidence. | |
| - ReportGenerator: patient and clinician summaries, confidence bars, highlights, and reflection notes. | |
| ### Output objects | |
| - Explainability (light): | |
| ```json | |
| { | |
| "evidence_strength": {"appetite": "missing", ...}, | |
| "recommended_focus": "appetite", | |
| "quotes": {"appetite": ["..."], ...}, | |
| "confidences": {"appetite": 0.34, ...} | |
| } | |
| ``` | |
| - Explainability (full): | |
| ```json | |
| { | |
| "items": [ | |
| {"item":"appetite","confidence":0.42,"evidence":["..."],"prosody":["rms_mean=0.012", "zcr_mean=0.065", ...]} | |
| ], | |
| "notes": "Heuristic placeholder" | |
| } | |
| ``` | |
| - Reflection report: | |
| ```json | |
| { | |
| "corrected_scores": {"appetite": 1, ...}, | |
| "final_total": 12, | |
| "severity_label": "Moderate Depression", | |
| "consistency_score": 0.89, | |
| "notes": "Model revised appetite score due to low confidence and missing evidence." | |
| } | |
| ``` | |
| ## Development notes | |
| - Framework: Gradio Blocks | |
| - ASR: Transformers pipeline (Whisper) | |
| - TTS: gTTS or Coqui TTS | |
| - Prosody features: librosa proxies; replaceable by OpenSMILE | |
| PRs and experiments are welcome. This is a research prototype and not a clinical tool. |