Spaces:

akisg
/

care-notes

Sleeping

App Files Files Community

care-notes / README.md

Akis Giannoukos

Added explainability

09716a4 about 1 month ago

preview code

raw

history blame contribute delete

5.55 kB

	---
	title: Conversational Assessment for Responsive Engagement (CARE) Notes
	emoji: 🐢
	colorFrom: indigo
	colorTo: gray
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app.py
	pinned: false
	short_description: AI-driven conversational module for depression-triage
	---

	# PHQ-9 Clinician Agent (Voice-first)

	A lightweight research demo that simulates a clinician conducting a brief conversational PHQ-9 screening. The app is voice-first: you tap a circular mic bubble to talk; the model replies and can speak back via TTS. A separate Advanced tab exposes scoring and configuration.

	## What it does
	- Conversational assessment to infer PHQ‑9 items from natural dialogue (no explicit questionnaire).
	- Live inference of PHQ‑9 item scores, confidences, total score, and severity.
	- Iterative light explainability after each turn to guide the next question (strong/weak/missing evidence by item).
	- Final explainability at session end aggregating linguistic quotes and acoustic prosody.
	- Self‑reflection step that checks consistency and may adjust low‑confidence item scores.
	- Automatic stop when minimum confidence across items reaches a threshold or risk is detected.
	- Optional TTS playback for clinician responses.

	## UI overview
	- Main tab: Large circular mic “Record” bubble
	- Tap to start, tap again to stop (processing runs on stop)
	- While speaking back (TTS), the bubble shows a speaking state
	- Chat tab: Plain chat transcript (for reviewing turns)
	- Advanced tab:
	- PHQ‑9 Assessment JSON (live)
	- Severity label
	- Confidence threshold slider (τ)
	- Toggle: Speak clinician responses (TTS)
	- Model ID textbox and “Apply model” button

	## Quick start (local)
	1. Python 3.10+ recommended.
	2. Install deps:
	```bash
	pip install -r requirements.txt
	```
	3. Run the app:
	```bash
	python app.py
	```
	4. Open the URL shown in the console (defaults to `http://0.0.0.0:7860`). Allow microphone access in your browser.

	## Configuration
	Environment variables (all optional):
	- `LLM_MODEL_ID` (default `google/gemma-2-2b-it`): chat model id
	- `ASR_MODEL_ID` (default `openai/whisper-tiny.en`): speech-to-text model id
	- `CONFIDENCE_THRESHOLD` (default `0.8`): stop when min item confidence ≥ τ
	- `MAX_TURNS` (default `12`): hard stop cap
	- `USE_TTS` (default `true`): enable TTS playback
	- `MODEL_CONFIG_PATH` (default `model_config.json`): persisted model id
	- `PORT` (default `7860`): server port

	Notes:
	- If a GPU is available, the app will use it automatically for Transformers pipelines.
	- Changing the model in Advanced will reload the text-generation pipeline on the next turn.

	## How to use
	1. Go to Main and tap the mic bubble. Speak naturally.
	2. Tap again to finish your turn. The model replies; if TTS is enabled, you’ll hear it.
	3. The Advanced tab updates live with PHQ‑9 scores and severity. Adjust the confidence threshold if you want the assessment to stop earlier/later.

	## Troubleshooting
	- No mic input detected:
	- Ensure the site has microphone permission in your browser settings.
	- Try refreshing the page after granting permission.
	- Can’t hear TTS:
	- Enable the “Speak clinician responses (TTS)” toggle in Advanced.
	- Ensure your system audio output is correct. Some browsers block auto‑play without interaction—use the mic once, then it should work.
	- Model download slow or fails:
	- Check internet connectivity and try again. Some models are large.
	- Assessment doesn’t stop:
	- Increase the confidence threshold slider (τ) in Advanced, or wait until the cap (`MAX_TURNS`).

	## Safety
	This demo does not provide therapy or emergency counseling. If a user expresses suicidal intent or risk is inferred, the app ends the conversation and advises contacting emergency services (e.g., 988 in the U.S.).

	## Architecture
	RecordingAgent → ScoringAgent → ExplainabilityModule(light/full) → ReflectionModule → ReportGenerator

	- RecordingAgent: generates clinician follow‑ups; guided by light explainability when available.
	- ScoringAgent: infers PHQ‑9 item scores and per‑item confidences from transcript (+prosody summary).
	- Explainability (light): keyword‑based evidence strength per item; selects next focus area.
	- Explainability (full): aggregates transcript quotes and averaged prosody features into per‑item objects.
	- Reflection: heuristic pass reduces scores by 1 for items with confidence < τ and missing evidence.
	- ReportGenerator: patient and clinician summaries, confidence bars, highlights, and reflection notes.

	### Output objects
	- Explainability (light):
	```json
	{
	"evidence_strength": {"appetite": "missing", ...},
	"recommended_focus": "appetite",
	"quotes": {"appetite": ["..."], ...},
	"confidences": {"appetite": 0.34, ...}
	}
	```
	- Explainability (full):
	```json
	{
	"items": [
	{"item":"appetite","confidence":0.42,"evidence":["..."],"prosody":["rms_mean=0.012", "zcr_mean=0.065", ...]}
	],
	"notes": "Heuristic placeholder"
	}
	```
	- Reflection report:
	```json
	{
	"corrected_scores": {"appetite": 1, ...},
	"final_total": 12,
	"severity_label": "Moderate Depression",
	"consistency_score": 0.89,
	"notes": "Model revised appetite score due to low confidence and missing evidence."
	}
	```

	## Development notes
	- Framework: Gradio Blocks
	- ASR: Transformers pipeline (Whisper)
	- TTS: gTTS or Coqui TTS
	- Prosody features: librosa proxies; replaceable by OpenSMILE

	PRs and experiments are welcome. This is a research prototype and not a clinical tool.