|
--- |
|
datasets: |
|
- nbertagnolli/counsel-chat |
|
language: |
|
- en |
|
base_model: |
|
- google/gemma-2-2b-it |
|
pipeline_tag: text-generation |
|
tags: |
|
- therapy |
|
- chat-bot |
|
library_name: transformers |
|
--- |
|
# TADBot |
|
|
|
## Problem Statement |
|
|
|
- Mental health issues are a significant problem that affects millions of people worldwide, yet many people struggle to seek help due to stigma, lack of resources, or other barriers. |
|
- Existing mental health resources often lack personalization and empathy, making it difficult for individuals to connect with them and receive the help they need. |
|
- There is a need for a more accessible, affordable, and personalized mental health resource that can provide support and advice to individuals who may be struggling with mental health issues. |
|
|
|
## Overview |
|
|
|
TADBot is a small language model that is fine-tuned on the nbertagnolli/counsel-chat dataset to assist people deal with mental problems and offer them advice based on the context of the conversation. It is designed to be a supportive and empathetic resource for those who may be struggling with mental health issues, providing personalized and tailored support and advice. TADBot is still in development and is not yet available for public use, but it has the potential to significantly improve access to mental health resources and support for individuals who may be struggling with mental health issues. |
|
|
|
## Technology used |
|
|
|
- Gemma 2 2B: A small language model with 2 billion parameters that TADBot is fine-tuned on. |
|
- nbertagnolli/counsel-chat: The dataset used to train TADBot on mental health and advice-giving tasks. |
|
- Hugging Face Transformers: A library used to fine-tune the Gemma 2 2B model on the nbertagnolli/counsel-chat dataset. |
|
- PyTorch: A library used for training and fine-tuning the language model. |
|
- Flask: A library used to create a server for TADBot. |
|
- Raspberry Pi: A small, low-cost computer used to host Test to Speech and Speech To Text models and TADBot server. |
|
- FER: A deep learning model used to detect emotions from faces in real-time using a webcam. |
|
- S2T and T2S: Speech to Text and Text to Speech models used to convert speech to text and text to speech respectively. |
|
|
|
# Features |
|
## FER Model: |
|
- TADBot uses a deep learning model to detect emotions from faces in real-time using a webcam. This allows TADBot to better understand the emotional context of a conversation and provide more appropriate and empathetic responses. |
|
- The Data from the FER model is sent to the TADBot server and is used to identify the emotion from the image sent by the client. This information is then used to generate a more appropriate response from the model. |
|
- The data is also logged seperatly in a text file which can be accessed by the client to track the change in emotion during the conversation. This can be used to provide insights into the conversation. |
|
- The Data is not collected and erased after every conversation adhereing to the doctor-client privacy |
|
> HLD for the FER model |
|
|
|
```mermaid |
|
flowchart TD |
|
%% User Interface Layer |
|
A[Raspberry PI] -->|Sends Image| B[detecfaces.py] |
|
B --->|Returns processed data| A |
|
|
|
%%Server |
|
subgraph Server |
|
%% Processing Layer |
|
B --> |Captured Image| T1[prediction.py] |
|
M1[RAFDB trained model] --> |epoch with best acc 92%|B |
|
T1-->|Top 3 emotions predicted| B |
|
|
|
%%Model Layer |
|
M1 |
|
|
|
%% Processing |
|
subgraph Processing |
|
T1 --> |Send Image|T2[detec_faces] |
|
T2 --> |Returns a 224x224 face|T1 |
|
end |
|
end |
|
``` |
|
## S2T Model and T2S Model: |
|
- The Text to Speech (T2S) and Speech to Text are facilitaed by the pyttx library that captures the audio from the system microphone and then uses the google backend to process the audio into text format |
|
- This text format is then passed onto the Language Model to generate a response based on the text provided by the pyttx library |
|
- The response generated by the Language Model is then again converted to Speech and output via a speaker |
|
|
|
> LLD for the S2T and T2S model |
|
|
|
```mermaid |
|
sequenceDiagram |
|
|
|
%% User level |
|
User->>+Raspberry PI: Speak into microphone |
|
Raspberry PI->>+main.py: Capture audio |
|
main.py->>+pyttx: Convert audio to text |
|
pyttx->>+Google Backend: Send audio for transcription |
|
Google Backend-->>-pyttx: Transcribed text |
|
pyttx-->>-main.py: Transcribed text |
|
main.py->>+LLM: Send text for response generation |
|
LLM-->>-main.py: Generated response text |
|
main.py->>+pyttx: Convert response text to audio |
|
pyttx->>+Google Backend: Send text for synthesis |
|
Google Backend-->>-pyttx: Synthesized audio |
|
pyttx-->>-main.py: Synthesized audio |
|
main.py->>+Raspberry PI: Output audio through speaker |
|
Raspberry PI-->>-User: Speaker output |
|
``` |
|
|
|
# How It Works |
|
|
|
## Model |
|
TADBot uses a fine-tuned version of the Gemma 2 2B language model to generate responses. The model is trained on the nbertagnolli/counsel-chat dataset from hugging face, which contains conversations between mental health professionals and clients. The model is fine-tuned using the Hugging Face Transformers library and PyTorch. |
|
### Dataset |
|
The raw version of the dataset consists of 2275 conversation taken from an online mental health platform. |
|
- The data consists of 'questionID', 'questionTitle', 'questionText', 'questionLink', 'topic', 'therapistInfo', 'therapistURL', 'answerText', 'upvotes', 'views' as features |
|
- The dataset is cleaned and preprocessed to remove any irrelevant or sensitive information only retaining |
|
- The data is then mapped to a custom prompt that allows for system level role instruction to be given to the model allow for much better responses. |
|
- The data is then split into training, validation |
|
- The model is trained on the training set and evaluated on the validation set. |
|
|
|
### Training |
|
- All the training data configuration is stored on a YAML file for easy access and modification. The YAML file contains the following information: |
|
- model name |
|
- new model name |
|
- lora configs |
|
- peft configs |
|
- training arguments |
|
- sft arguments |
|
- The general model is trained on the following parameters |
|
-learning rate: 2e-4 |
|
-batch size: 2 |
|
-gradient accumulation steps: 2 |
|
-num train epochs: 1 |
|
-weight decay: 0.01 |
|
-optimizer: paged_adamw_32bit |
|
- The fine tuned lora adapters are then merged with the base model to give the final model |
|
- The final model is then saved and pushed to the Hugging Face model hub for easy access and deployment. |
|
|
|
```mermaid |
|
sequenceDiagram |
|
participant hyperparameter |
|
participant trainer |
|
participant model |
|
participant tokenizer |
|
participant dataset |
|
participant custom prompt |
|
|
|
hyperparameter->>trainer: Initialize PEFT config |
|
hyperparameter->>model: Initialize bitsandbytes config |
|
hyperparameter->>trainer: Initialize training arguments |
|
hyperparameter->>trainer: Initialize SFT arguments |
|
dataset->>custom prompt: Custome prompt |
|
custom prompt-->>dataset: Mapped dataset |
|
dataset-->>trainer: Dataset for training |
|
tokenizer->>trainer: Tokenize input text |
|
model-->>trainer: Model for training |
|
trainer->>model: Train model |
|
model-->>trainer: Trained model |
|
trainer->>model: Save model |
|
trainer->>tokenizer: Save tokenizer |
|
model-->>Hugging Face: Push model to Hub |
|
tokenizer-->>Hugging Face: Push tokenizer to Hub |
|
|
|
``` |
|
### Inference |
|
- Since the model size is generally quite large and difficult to run on commerical hardwares, the model is quantized using |
|
llama.cpp to reduce the model size from ~5GB to > 2GB. This allows the model to be run on a Raspberry Pi 4 with 4GB of RAM. |
|
- The model is then deployed on a Raspberry Pi 4 and inferenced using the ollama rest api. |
|
- The conversations are also stored in a vector embedding to futher improve the response generated by the model |
|
- at the end of the conversation the model creates a log file that stores conversation between the user and the model that can be usefull during diagnosis by a therapist |
|
|
|
# Deployment Instructions |
|
|
|
To deploy TADBot locally, you will need to follow these steps: |
|
|
|
- create a virtual environment(preferrablly python 3.12) with pip-tools or uv installed and install the required dependencies |
|
|
|
``` |
|
pip sync requirements.txt #if u are using pip-tools |
|
pip install -r requirements.txt #if u are using pip |
|
uv sync #if u are using uv |
|
``` |