Spaces:

akisg
/

care-notes

Sleeping

App Files Files Community

care-notes / README.md

Akis Giannoukos

Added explainability

09716a4 27 days ago

preview code

raw

history blame contribute delete

5.55 kB

A newer version of the Gradio SDK is available: 6.0.0

Upgrade

metadata

title: Conversational Assessment for Responsive Engagement (CARE) Notes
emoji: 🐢
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: AI-driven conversational module for depression-triage

PHQ-9 Clinician Agent (Voice-first)

A lightweight research demo that simulates a clinician conducting a brief conversational PHQ-9 screening. The app is voice-first: you tap a circular mic bubble to talk; the model replies and can speak back via TTS. A separate Advanced tab exposes scoring and configuration.

What it does

Conversational assessment to infer PHQ‑9 items from natural dialogue (no explicit questionnaire).
Live inference of PHQ‑9 item scores, confidences, total score, and severity.
Iterative light explainability after each turn to guide the next question (strong/weak/missing evidence by item).
Final explainability at session end aggregating linguistic quotes and acoustic prosody.
Self‑reflection step that checks consistency and may adjust low‑confidence item scores.
Automatic stop when minimum confidence across items reaches a threshold or risk is detected.
Optional TTS playback for clinician responses.

UI overview

Main tab: Large circular mic “Record” bubble
- Tap to start, tap again to stop (processing runs on stop)
- While speaking back (TTS), the bubble shows a speaking state
Chat tab: Plain chat transcript (for reviewing turns)
Advanced tab:
- PHQ‑9 Assessment JSON (live)
- Severity label
- Confidence threshold slider (τ)
- Toggle: Speak clinician responses (TTS)
- Model ID textbox and “Apply model” button

Quick start (local)

Python 3.10+ recommended.
Install deps:
```
pip install -r requirements.txt
```
Run the app:
```
python app.py
```
Open the URL shown in the console (defaults to http://0.0.0.0:7860). Allow microphone access in your browser.

Configuration

Environment variables (all optional):

LLM_MODEL_ID (default google/gemma-2-2b-it): chat model id
ASR_MODEL_ID (default openai/whisper-tiny.en): speech-to-text model id
CONFIDENCE_THRESHOLD (default 0.8): stop when min item confidence ≥ τ
MAX_TURNS (default 12): hard stop cap
USE_TTS (default true): enable TTS playback
MODEL_CONFIG_PATH (default model_config.json): persisted model id
PORT (default 7860): server port

Notes:

If a GPU is available, the app will use it automatically for Transformers pipelines.
Changing the model in Advanced will reload the text-generation pipeline on the next turn.

How to use

Go to Main and tap the mic bubble. Speak naturally.
Tap again to finish your turn. The model replies; if TTS is enabled, you’ll hear it.
The Advanced tab updates live with PHQ‑9 scores and severity. Adjust the confidence threshold if you want the assessment to stop earlier/later.

Troubleshooting

No mic input detected:
- Ensure the site has microphone permission in your browser settings.
- Try refreshing the page after granting permission.
Can’t hear TTS:
- Enable the “Speak clinician responses (TTS)” toggle in Advanced.
- Ensure your system audio output is correct. Some browsers block auto‑play without interaction—use the mic once, then it should work.
Model download slow or fails:
- Check internet connectivity and try again. Some models are large.
Assessment doesn’t stop:
- Increase the confidence threshold slider (τ) in Advanced, or wait until the cap (MAX_TURNS).

Safety

This demo does not provide therapy or emergency counseling. If a user expresses suicidal intent or risk is inferred, the app ends the conversation and advises contacting emergency services (e.g., 988 in the U.S.).

Architecture

RecordingAgent → ScoringAgent → ExplainabilityModule(light/full) → ReflectionModule → ReportGenerator

RecordingAgent: generates clinician follow‑ups; guided by light explainability when available.
ScoringAgent: infers PHQ‑9 item scores and per‑item confidences from transcript (+prosody summary).
Explainability (light): keyword‑based evidence strength per item; selects next focus area.
Explainability (full): aggregates transcript quotes and averaged prosody features into per‑item objects.
Reflection: heuristic pass reduces scores by 1 for items with confidence < τ and missing evidence.
ReportGenerator: patient and clinician summaries, confidence bars, highlights, and reflection notes.

Output objects

Explainability (light):

{
  "evidence_strength": {"appetite": "missing", ...},
  "recommended_focus": "appetite",
  "quotes": {"appetite": ["..."], ...},
  "confidences": {"appetite": 0.34, ...}
}

Explainability (full):

{
  "items": [
    {"item":"appetite","confidence":0.42,"evidence":["..."],"prosody":["rms_mean=0.012", "zcr_mean=0.065", ...]}
  ],
  "notes": "Heuristic placeholder"
}

Reflection report:

{
  "corrected_scores": {"appetite": 1, ...},
  "final_total": 12,
  "severity_label": "Moderate Depression",
  "consistency_score": 0.89,
  "notes": "Model revised appetite score due to low confidence and missing evidence."
}

Development notes

Framework: Gradio Blocks
ASR: Transformers pipeline (Whisper)
TTS: gTTS or Coqui TTS
Prosody features: librosa proxies; replaceable by OpenSMILE

PRs and experiments are welcome. This is a research prototype and not a clinical tool.