Model Card for Model ID

Model Details

Model Description

A MultiModal architecture model that was created and finetuned jointly by concatenating Hubert and BERT embeddings. Hubert model was fine-tuned with a classification head on preprocessed audio and emotion labels in supervised manner. BERT was trained on text transcrition embeddings.

Model can accurately recognize emotions classes- Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.

'Virtual Therapist' app - an Intelligent speech and text input based assistant that can decipher emotions and generate therapeutic messages based on the Emotional state of the user.

Use the code below to get started with the model:

class MultimodalModel(nn.Module): ''' Custom PyTorch model that takes as input both the audio features and the text embeddings, and concatenates the last hidden states from the Hubert and BERT models. ''' def init(self, bert_model_name, num_labels): super().init() self.hubert = HubertForSequenceClassification.from_pretrained("netgvarun2005/HubertStandaloneEmoDetector", num_labels=num_labels).hubert self.bert = AutoModel.from_pretrained(bert_model_name) self.classifier = nn.Linear(self.hubert.config.hidden_size + self.bert.config.hidden_size, num_labels)

def forward(self, input_values, text):
    hubert_output = self.hubert(input_values).last_hidden_state

    bert_output = self.bert(text).last_hidden_state

    # Apply mean pooling along the sequence dimension
    hubert_output = hubert_output.mean(dim=1)
    bert_output = bert_output.mean(dim=1)

    concat_output = torch.cat((hubert_output, bert_output), dim=-1)
    logits = self.classifier(concat_output)
    return logits

    def load_model():
Load and configure various models and tokenizers for a multi-modal application.

This function loads a multi-modal model and its weights from a specified source,
initializes tokenizers for the model and an additional language model, and returns
these components for use in a multi-modal application.

    tuple: A tuple containing the following components:
        - multiModel (MultimodalModel): The multi-modal model.
        - tokenizer (AutoTokenizer): Tokenizer for the multi-modal model.
        - model_gpt (AutoModelForCausalLM): Language model for text generation.
        - tokenizer_gpt (AutoTokenizer): Tokenizer for the language model.
# Load the model
multiModel = MultimodalModel(bert_model_name, num_labels)

# Load the model weights and tokenizer directly from Hugging Face Spaces
multiModel.load_state_dict(torch.hub.load_state_dict_from_url(model_weights_path, map_location=device), strict=False)
tokenizer = AutoTokenizer.from_pretrained("netgvarun2005/MultiModalBertHubertTokenizer") 

# GenAI
tokenizer_gpt = AutoTokenizer.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedTokenizer", pad_token='<|pad|>',bos_token='<|startoftext|>',eos_token='<|endoftext|>')
model_gpt = AutoModelForCausalLM.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedModel")

return multiModel,tokenizer,model_gpt,tokenizer_gpt

Model Card Authors [Varun Sharma]

