Model Card for Model ID
This modelcard aims to be a base template for new models. It has been generated using this raw template.
Model Details
Model Description
A MultiModal architecture model that was created and finetuned jointly by concatenating Hubert and BERT embeddings. Hubert model was fine-tuned with a classification head on preprocessed audio and emotion labels in supervised manner. BERT was trained on text transcrition embeddings.
Model can accurately recognize emotions classes- Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.
- Developed by: [https://www.linkedin.com/in/sharmavaruncs/]
- Model type: [MultiModal - Text and Audio based]
- Language(s) (NLP): [NLP, Speech processing]
- Finetuned from model [optional]: [https://huggingface.co/docs/transformers/model_doc/hubert]
Model Sources [optional]
- Repository: [https://github.com/netgvarun2012/VirtualTherapist/]
- Paper [optional]: [https://github.com/netgvarun2012/VirtualTherapist/blob/main/documentation/Speech_and_Text_based_MultiModal_Emotion_Recognizer.pdf]
- Demo [optional]: [https://huggingface.co/spaces/netgvarun2005/VirtualTherapist]
Uses
'Virtual Therapist' app - an Intelligent speech and text input based assistant that can decipher emotions and generate therapeutic messages based on the Emotional state of the user.
Emotions recognized - Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.
Use the code below to get started with the model:
class MultimodalModel(nn.Module): ''' Custom PyTorch model that takes as input both the audio features and the text embeddings, and concatenates the last hidden states from the Hubert and BERT models. ''' def init(self, bert_model_name, num_labels): super().init() self.hubert = HubertForSequenceClassification.from_pretrained("netgvarun2005/HubertStandaloneEmoDetector", num_labels=num_labels).hubert self.bert = AutoModel.from_pretrained(bert_model_name) self.classifier = nn.Linear(self.hubert.config.hidden_size + self.bert.config.hidden_size, num_labels)
def forward(self, input_values, text):
hubert_output = self.hubert(input_values).last_hidden_state
bert_output = self.bert(text).last_hidden_state
# Apply mean pooling along the sequence dimension
hubert_output = hubert_output.mean(dim=1)
bert_output = bert_output.mean(dim=1)
concat_output = torch.cat((hubert_output, bert_output), dim=-1)
logits = self.classifier(concat_output)
return logits
def load_model():
"""
Load and configure various models and tokenizers for a multi-modal application.
This function loads a multi-modal model and its weights from a specified source,
initializes tokenizers for the model and an additional language model, and returns
these components for use in a multi-modal application.
Returns:
tuple: A tuple containing the following components:
- multiModel (MultimodalModel): The multi-modal model.
- tokenizer (AutoTokenizer): Tokenizer for the multi-modal model.
- model_gpt (AutoModelForCausalLM): Language model for text generation.
- tokenizer_gpt (AutoTokenizer): Tokenizer for the language model.
"""
# Load the model
multiModel = MultimodalModel(bert_model_name, num_labels)
# Load the model weights and tokenizer directly from Hugging Face Spaces
multiModel.load_state_dict(torch.hub.load_state_dict_from_url(model_weights_path, map_location=device), strict=False)
tokenizer = AutoTokenizer.from_pretrained("netgvarun2005/MultiModalBertHubertTokenizer")
# GenAI
tokenizer_gpt = AutoTokenizer.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedTokenizer", pad_token='<|pad|>',bos_token='<|startoftext|>',eos_token='<|endoftext|>')
model_gpt = AutoModelForCausalLM.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedModel")
return multiModel,tokenizer,model_gpt,tokenizer_gpt