Spaces:

kianpaya
/

MentalHealthGPT

Running

App Files Files Community

kianpaya commited on Nov 5, 2024

Commit

046e707

verified ·

1 Parent(s): a230944

Upload 8 files

Browse files

Files changed (8) hide show

LICENSE +21 -0
README.md +52 -0
analysis.ipynb +0 -0
app.py +4 -0
egpt.py +51 -0
elit.py +35 -0
etal.py +190 -0
requirements.txt +12 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2024 Kian Paya
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,52 @@

+# MentalHealthGPT
+## Overview
+**MentalHealthGPT** is an AI-powered application developed to assist mental health counselors by analyzing the tone of client conversations and generating responses that are sensitive to the emotional context. Combining advanced NLP models, including BERT for tone classification and GPT for response refinement, MentalHealthGPT aims to support mental health professionals in fostering empathetic and productive interactions with clients.
+<p align="center">
+  <img src="Images/Legacy_App.jpg" width="600" alt="MentalHealthGPT Interface">
+</p>
+<p align="center"><i>MentalHealthGPT app interface: Recognizes conversation tone and provides counseling guidance.</i></p>
+## Key Features
+- **Tone Classification**: The app uses a BERT-based model to assess the emotional tone in user input, categorizing it as empathy, frustration, supportiveness, or other emotional states. This allows counselors to gain insight into the client’s emotional state and tailor their responses accordingly.
+- **GPT-Based Response Generation**: Leveraging OpenAI's API, the app fine-tunes GPT responses based on the tone identified by BERT. This two-stage process ensures that the responses are contextually appropriate, supportive, and reflective of the client’s needs, enhancing the counselor-client interaction.
+- **User-Friendly Interface**: MentalHealthGPT is built with Streamlit, offering a straightforward and interactive interface. Counselors can input text, analyze tone, and view responses generated by GPT all within the same platform, making it accessible even for non-technical users.
+## Hosting
+The application is hosted on **Hugging Face Spaces**, which provides a scalable, secure, and user-friendly environment for real-time interactions. Hosting on Hugging Face Spaces makes the tool accessible from any browser without requiring local installations, providing flexibility and ease of use for mental health professionals.
+---
+## Purpose and Impact
+The purpose of MentalHealthGPT is to support mental health counselors by:
+- **Improving Emotional Awareness**: Helping counselors identify and understand the client’s emotional tone more accurately.
+- **Enhancing Communication**: Offering emotionally aligned responses that build rapport and foster understanding.
+- **Saving Time and Effort**: Providing an efficient tool to assist counselors in real-time, allowing them to focus more on interaction quality.
+## Future Considerations
+- **Broader Emotion Spectrum**: Expanding the emotion classification model to recognize a wider range of emotions, such as optimism, anxiety, or neutrality, would increase the app’s relevance across diverse counseling sessions.
+- **Privacy and Security Enhancements**: As MentalHealthGPT processes sensitive data, implementing stricter privacy controls and secure data handling practices would enhance trustworthiness.
+- **Multilingual Support**: Introducing multilingual models to accommodate clients who may not speak English, thereby making the application useful for a wider global audience.
+- **Fine-Tuning with Mental Health-Specific Data**: Leveraging datasets specifically related to mental health interactions could improve response quality and relevance.
+## Challenges
+- **Accuracy in Tone Detection**: Tone detection is complex, and achieving accurate classification, especially in nuanced or ambiguous text, remains a challenge. Misclassification can lead to inappropriate or ineffective responses.
+- **Dependency on External APIs**: Reliance on OpenAI’s API for GPT-based response generation can introduce latency and may be cost-prohibitive with large-scale usage.
+- **Ethical Considerations**: As an AI-driven tool in mental health, there are ethical considerations around transparency, bias in model responses, and the potential impact of machine-generated responses on clients’ mental health.
+---
+## Conclusion
+**MentalHealthGPT** combines AI-driven tone analysis with responsive, context-aware text generation to empower counselors with better communication tools. By leveraging both classification and fine-tuning, it supports mental health professionals in creating empathetic and effective interactions, making it a valuable tool for counseling environments. Hosted on Hugging Face Spaces, it provides an accessible platform for professionals seeking AI-enhanced support in their daily interactions.
+---

analysis.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

app.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from elit import elit
+if __name__ == '__main__':
+    elit()

egpt.py ADDED Viewed

	@@ -0,0 +1,51 @@

+import warnings
+warnings.filterwarnings("ignore")
+import torchvision
+torchvision.disable_beta_transforms_warning()
+import openai
+import pandas as pd
+from transformers import BertTokenizer
+from sklearn.metrics.pairwise import cosine_similarity
+from sentence_transformers import SentenceTransformer
+class egpt:
+    def __init__(self, apiKey, modelName='gpt-4-turbo', embeddingModel='all-MiniLM-L6-v2', datasetPath='hf://datasets/Amod/mental_health_counseling_conversations/combined_dataset.json'):
+        openai.api_key = apiKey
+        self.modelName = modelName
+        self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
+        self.embeddingModel = SentenceTransformer(embeddingModel)
+        self.dataset = self.loadDataset(datasetPath)
+        self.knowledgeBase = self.createKnowledgeBase()
+    def loadDataset(self, path):
+        dataset = pd.read_json(path, lines=True)
+        return dataset[['Context', 'Response']].values.tolist()
+    def createKnowledgeBase(self):
+        knowledgeBase = []
+        for context, response in self.dataset:
+            embedding = self.embeddingModel.encode(context)
+            knowledgeBase.append((embedding, response))
+        return knowledgeBase
+    def getSimilarResponse(self, userContext):
+        userEmbedding = self.embeddingModel.encode(userContext)
+        similarities = [cosine_similarity([userEmbedding], [kbEmbedding])[0][0] for kbEmbedding, _ in self.knowledgeBase]
+        bestMatchIdx = similarities.index(max(similarities))
+        _, bestResponse = self.knowledgeBase[bestMatchIdx]
+        return bestResponse
+    def queryGpt(self, context):
+        response = openai.ChatCompletion.create(
+            model=self.modelName,
+            messages=[{'role': 'user', 'content': context}]
+        )
+        return response.choices[0].message['content']
+    def respond(self, userContext):
+        similarResponse = self.getSimilarResponse(userContext)
+        prompt = f'Given the following context and a similar response, please respond appropriately:\n\nContext: {userContext}\n\nSimilar Response: {similarResponse}'
+        return self.queryGpt(prompt)

elit.py ADDED Viewed

	@@ -0,0 +1,35 @@

+import streamlit as st
+from etal import *
+from egpt import *
+class elit:
+    def __init__(self):
+        st.set_page_config(page_title='Legacy - Mental Health', layout='centered')
+        self.displayHeader()
+        self.modelChoice = st.radio('Choose a Model', ['etal', 'egpt'])
+        if self.modelChoice == 'etal':
+            self.displayEtalPanel()
+        elif self.modelChoice == 'egpt':
+            self.displayEgptPanel()
+    def displayHeader(self):
+        st.title('Legacy - Mental Health')
+        st.markdown('[Open Google Colab Notebook for Analysis](https://colab.research.google.com/drive/1UVrgohHSifjsw2OVP8j8EfDs_qeTOkCn?usp=sharing)')
+    def displayEtalPanel(self):
+        st.subheader('etal Model - Usage & Response')
+        inputText = st.text_area('Enter Context for etal', placeholder='Type the context here...')
+        if st.button('Get Response from etal'):
+            model = etal()
+            response = model.predict(inputText)
+            st.write('Response:', response)
+    def displayEgptPanel(self):
+        st.subheader('egpt Model - Usage & Response')
+        inputText = st.text_area('Enter Context for egpt', placeholder='Type the context here...')
+        if st.button('Get Response from egpt'):
+            apiKey = st.secrets['openai_api_key']
+            model = egpt(apiKey)
+            response = model.respond(inputText)
+            st.write('Response:', response)

etal.py ADDED Viewed

	@@ -0,0 +1,190 @@

+import warnings
+warnings.filterwarnings('ignore')
+warnings.filterwarnings("ignore", category=UserWarning)
+import torchvision
+torchvision.disable_beta_transforms_warning()
+import os
+import re
+from transformers import BertTokenizer, BertForSequenceClassification
+from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import classification_report
+import torch
+import torch.nn as nn
+import numpy as np
+from alive_progress import alive_bar
+class Preprocessor:
+    def __init__(self, modelName='bert-base-uncased'):
+        self.tokenizer = BertTokenizer.from_pretrained(modelName)
+        self.labelMap = {
+            0: 'Anxiety',
+            1: 'Depression',
+            2: 'Stress',
+            3: 'Happiness',
+            4: 'Relationship Issues',
+            5: 'Self-Harm',
+            6: 'Substance Abuse',
+            7: 'Trauma',
+            8: 'Obsessive Compulsive Disorder',
+            9: 'Eating Disorders',
+            10: 'Grief',
+            11: 'Phobias',
+            12: 'Bipolar Disorder',
+            13: 'Post-Traumatic Stress Disorder',
+            14: 'Mental Fatigue',
+            15: 'Mood Swings',
+            16: 'Anger Management',
+            17: 'Social Isolation',
+            18: 'Perfectionism',
+            19: 'Low Self-Esteem',
+            20: 'Family Issues'
+        }
+        self.keywords = {
+            'anxiety': 0,
+            'depressed': 1,
+            'sad': 1,
+            'stress': 2,
+            'happy': 3,
+            'relationship': 4,
+            'self-harm': 5,
+            'substance': 6,
+            'trauma': 7,
+            'ocd': 8,
+            'eating': 9,
+            'grief': 10,
+            'phobia': 11,
+            'bipolar': 12,
+            'ptsd': 13,
+            'fatigue': 14,
+            'mood': 15,
+            'anger': 16,
+            'isolated': 17,
+            'perfectionism': 18,
+            'self-esteem': 19,
+            'family': 20
+        }
+    def tokenizeText(self, text, maxLength=128):
+        return self.tokenizer(
+            text,
+            padding='max_length',
+            truncation=True,
+            max_length=maxLength,
+            return_tensors='pt'
+        )
+    def preprocessDataset(self, texts):
+        inputIds, attentionMasks = [], []
+        for text in texts:
+            encodedDict = self.tokenizeText(text)
+            inputIds.append(encodedDict['input_ids'])
+            attentionMasks.append(encodedDict['attention_mask'])
+        return torch.cat(inputIds, dim=0), torch.cat(attentionMasks, dim=0)
+    def labelContext(self, context):
+        context = context.lower()
+        pattern = r'\b(?:' + '|'.join(re.escape(keyword) for keyword in self.keywords.keys()) + r')\b'
+        match = re.search(pattern, context)
+        return self.keywords[match.group(0)] if match else None
+class etal(Preprocessor):
+    def __init__(self, modelName='bert-base-uncased', numLabels=21):
+        super().__init__(modelName)
+        self.model = BertForSequenceClassification.from_pretrained(modelName, num_labels=numLabels)
+        self.criterion = nn.CrossEntropyLoss()
+    def train(self, texts, labels, epochs=3, batchSize=8, learningRate=2e-5):
+        inputIds, attentionMasks = self.preprocessDataset(texts)
+        labels = torch.tensor(labels, dtype=torch.long)
+        trainIdx, valIdx = train_test_split(np.arange(len(labels)), test_size=0.2, random_state=42)
+        trainIds, valIds = inputIds[trainIdx], inputIds[valIdx]
+        trainMasks, valMasks = attentionMasks[trainIdx], attentionMasks[valIdx]
+        trainLabels, valLabels = labels[trainIdx], labels[valIdx]
+        trainData = torch.utils.data.TensorDataset(trainIds, trainMasks, trainLabels)
+        valData = torch.utils.data.TensorDataset(valIds, valMasks, valLabels)
+        trainLoader = torch.utils.data.DataLoader(trainData, batch_size=batchSize, shuffle=True)
+        valLoader = torch.utils.data.DataLoader(valData, batch_size=batchSize)
+        optimizer = torch.optim.AdamW(self.model.parameters(), lr=learningRate)
+        bestValLoss = float('inf')
+        with alive_bar(epochs, title='Training Progress') as bar:
+            for epoch in range(epochs):
+                totalLoss = 0
+                self.model.train()
+                for i, batch in enumerate(trainLoader):
+                    batchIds, batchMasks, batchLabels = batch
+                    self.model.zero_grad()
+                    outputs = self.model(input_ids=batchIds, attention_mask=batchMasks, labels=batchLabels)
+                    loss = outputs.loss
+                    totalLoss += loss.item()
+                    loss.backward()
+                    optimizer.step()
+                    print(f"Epoch {epoch + 1}/{epochs}, Batch {i + 1}/{len(trainLoader)}, Loss: {loss.item()}")
+                avgTrainLoss = totalLoss / len(trainLoader)
+                valLoss = self.evaluate(valLoader)
+                if valLoss < bestValLoss:
+                    bestValLoss = valLoss
+                    self.save('models', f'e{epoch}l{valLoss}.pt')
+                    print(f"Model State Dict Saved at: {os.path.join(os.getcwd(), 'models', f'e{epoch}l{valLoss}.pt')}")
+                print(f'Epoch {epoch + 1}, Train Loss: {avgTrainLoss}, Validation Loss: {valLoss}')
+                bar()
+    def evaluate(self, dataLoader):
+        self.model.eval()
+        predictions, trueLabels = [], []
+        totalLoss = 0
+        with torch.no_grad():
+            for batch in dataLoader:
+                batchIds, batchMasks, batchLabels = batch
+                outputs = self.model(input_ids=batchIds, attention_mask=batchMasks, labels=batchLabels)
+                logits = outputs.logits
+                loss = outputs.loss
+                totalLoss += loss.item()
+                predictions.extend(torch.argmax(logits, axis=1).cpu().numpy())
+                trueLabels.extend(batchLabels.cpu().numpy())
+        print(classification_report(trueLabels, predictions))
+        return totalLoss / len(dataLoader)
+    def predict(self, text):
+        self.model.eval()
+        tokens = self.tokenizeText(text)
+        with torch.no_grad():
+            outputs = self.model(input_ids=tokens['input_ids'], attention_mask=tokens['attention_mask'])
+            prediction = torch.argmax(outputs.logits, axis=1).item()
+        return self.labelMap.get(prediction)
+    def save(self, folder, filename):
+        if not os.path.exists(folder):
+            os.makedirs(folder)
+        filepath = os.path.join(folder, filename)
+        torch.save(self.model.state_dict(), filepath)
+    def load(self, filePath, best = True):
+        if best:
+            modelFiles = [f for f in os.listdir(filePath) if f.endswith('.pt')]
+            if not modelFiles:
+                print('No model files found in the specified folder.')
+                return
+            modelFiles.sort(key=lambda x: (int(x.split('e')[1].split('l')[0]), float(x.split('l')[1].split('.')[0])))
+            bestModelFile = modelFiles[-1]
+            modelPath = os.path.join(filePath, bestModelFile)
+            self.model.load_state_dict(torch.load(modelPath))
+        else:
+            self.model.load_state_dict(torch.load(filePath))
+        print(f'Loaded model state dict')
+        self.model.eval()

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+# pip install -r requirements.txt
+torch
+torchvision
+transformers
+scikit-learn
+numpy
+alive-progress
+openai==0.28
+pandas
+sentence-transformers
+streamlit