File size: 5,550 Bytes
8991737
fae1128
8991737
 
 
 
 
 
 
fae1128
8991737
 
2e9e60e
 
 
 
 
 
 
09716a4
 
 
2e9e60e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09716a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2e9e60e
 
 
09716a4
 
2e9e60e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
title: Conversational Assessment for Responsive Engagement (CARE) Notes
emoji: 🐢
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: AI-driven conversational module for depression-triage
---

# PHQ-9 Clinician Agent (Voice-first)

A lightweight research demo that simulates a clinician conducting a brief conversational PHQ-9 screening. The app is voice-first: you tap a circular mic bubble to talk; the model replies and can speak back via TTS. A separate Advanced tab exposes scoring and configuration.

## What it does
- Conversational assessment to infer PHQ‑9 items from natural dialogue (no explicit questionnaire).
- Live inference of PHQ‑9 item scores, confidences, total score, and severity.
- Iterative light explainability after each turn to guide the next question (strong/weak/missing evidence by item).
- Final explainability at session end aggregating linguistic quotes and acoustic prosody.
- Self‑reflection step that checks consistency and may adjust low‑confidence item scores.
- Automatic stop when minimum confidence across items reaches a threshold or risk is detected.
- Optional TTS playback for clinician responses.

## UI overview
- Main tab: Large circular mic “Record” bubble
  - Tap to start, tap again to stop (processing runs on stop)
  - While speaking back (TTS), the bubble shows a speaking state
- Chat tab: Plain chat transcript (for reviewing turns)
- Advanced tab:
  - PHQ‑9 Assessment JSON (live)
  - Severity label
  - Confidence threshold slider (τ)
  - Toggle: Speak clinician responses (TTS)
  - Model ID textbox and “Apply model” button

## Quick start (local)
1. Python 3.10+ recommended.
2. Install deps:
   ```bash
   pip install -r requirements.txt
   ```
3. Run the app:
   ```bash
   python app.py
   ```
4. Open the URL shown in the console (defaults to `http://0.0.0.0:7860`). Allow microphone access in your browser.

## Configuration
Environment variables (all optional):
- `LLM_MODEL_ID` (default `google/gemma-2-2b-it`): chat model id
- `ASR_MODEL_ID` (default `openai/whisper-tiny.en`): speech-to-text model id
- `CONFIDENCE_THRESHOLD` (default `0.8`): stop when min item confidence ≥ τ
- `MAX_TURNS` (default `12`): hard stop cap
- `USE_TTS` (default `true`): enable TTS playback
- `MODEL_CONFIG_PATH` (default `model_config.json`): persisted model id
- `PORT` (default `7860`): server port

Notes:
- If a GPU is available, the app will use it automatically for Transformers pipelines.
- Changing the model in Advanced will reload the text-generation pipeline on the next turn.

## How to use
1. Go to Main and tap the mic bubble. Speak naturally.
2. Tap again to finish your turn. The model replies; if TTS is enabled, you’ll hear it.
3. The Advanced tab updates live with PHQ‑9 scores and severity. Adjust the confidence threshold if you want the assessment to stop earlier/later.

## Troubleshooting
- No mic input detected:
  - Ensure the site has microphone permission in your browser settings.
  - Try refreshing the page after granting permission.
- Can’t hear TTS:
  - Enable the “Speak clinician responses (TTS)” toggle in Advanced.
  - Ensure your system audio output is correct. Some browsers block auto‑play without interaction—use the mic once, then it should work.
- Model download slow or fails:
  - Check internet connectivity and try again. Some models are large.
- Assessment doesn’t stop:
  - Increase the confidence threshold slider (τ) in Advanced, or wait until the cap (`MAX_TURNS`).

## Safety
This demo does not provide therapy or emergency counseling. If a user expresses suicidal intent or risk is inferred, the app ends the conversation and advises contacting emergency services (e.g., 988 in the U.S.).

## Architecture
RecordingAgent → ScoringAgent → ExplainabilityModule(light/full) → ReflectionModule → ReportGenerator

- RecordingAgent: generates clinician follow‑ups; guided by light explainability when available.
- ScoringAgent: infers PHQ‑9 item scores and per‑item confidences from transcript (+prosody summary).
- Explainability (light): keyword‑based evidence strength per item; selects next focus area.
- Explainability (full): aggregates transcript quotes and averaged prosody features into per‑item objects.
- Reflection: heuristic pass reduces scores by 1 for items with confidence < τ and missing evidence.
- ReportGenerator: patient and clinician summaries, confidence bars, highlights, and reflection notes.

### Output objects
- Explainability (light):
  ```json
  {
    "evidence_strength": {"appetite": "missing", ...},
    "recommended_focus": "appetite",
    "quotes": {"appetite": ["..."], ...},
    "confidences": {"appetite": 0.34, ...}
  }
  ```
- Explainability (full):
  ```json
  {
    "items": [
      {"item":"appetite","confidence":0.42,"evidence":["..."],"prosody":["rms_mean=0.012", "zcr_mean=0.065", ...]}
    ],
    "notes": "Heuristic placeholder"
  }
  ```
- Reflection report:
  ```json
  {
    "corrected_scores": {"appetite": 1, ...},
    "final_total": 12,
    "severity_label": "Moderate Depression",
    "consistency_score": 0.89,
    "notes": "Model revised appetite score due to low confidence and missing evidence."
  }
  ```

## Development notes
- Framework: Gradio Blocks
- ASR: Transformers pipeline (Whisper)
- TTS: gTTS or Coqui TTS
- Prosody features: librosa proxies; replaceable by OpenSMILE

PRs and experiments are welcome. This is a research prototype and not a clinical tool.