Edit model card

Model Card for Model ID

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

A MultiModal architecture model that was created and finetuned jointly by concatenating Hubert and BERT embeddings. Hubert model was fine-tuned with a classification head on preprocessed audio and emotion labels in supervised manner. BERT was trained on text transcrition embeddings.

Model can accurately recognize emotions classes- Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.

Model Sources [optional]

Uses

'Virtual Therapist' app - an Intelligent speech and text input based assistant that can decipher emotions and generate therapeutic messages based on the Emotional state of the user.

Emotions recognized - Angry,Sad,Fearful,Happy,Disgusted,Surprised,Calm with ~80% accuracy.

Use the code below to get started with the model:

class MultimodalModel(nn.Module): ''' Custom PyTorch model that takes as input both the audio features and the text embeddings, and concatenates the last hidden states from the Hubert and BERT models. ''' def init(self, bert_model_name, num_labels): super().init() self.hubert = HubertForSequenceClassification.from_pretrained("netgvarun2005/HubertStandaloneEmoDetector", num_labels=num_labels).hubert self.bert = AutoModel.from_pretrained(bert_model_name) self.classifier = nn.Linear(self.hubert.config.hidden_size + self.bert.config.hidden_size, num_labels)

def forward(self, input_values, text):
    hubert_output = self.hubert(input_values).last_hidden_state

    bert_output = self.bert(text).last_hidden_state

    # Apply mean pooling along the sequence dimension
    hubert_output = hubert_output.mean(dim=1)
    bert_output = bert_output.mean(dim=1)

    concat_output = torch.cat((hubert_output, bert_output), dim=-1)
    logits = self.classifier(concat_output)
    return logits


    def load_model():
"""
Load and configure various models and tokenizers for a multi-modal application.

This function loads a multi-modal model and its weights from a specified source,
initializes tokenizers for the model and an additional language model, and returns
these components for use in a multi-modal application.

Returns:
    tuple: A tuple containing the following components:
        - multiModel (MultimodalModel): The multi-modal model.
        - tokenizer (AutoTokenizer): Tokenizer for the multi-modal model.
        - model_gpt (AutoModelForCausalLM): Language model for text generation.
        - tokenizer_gpt (AutoTokenizer): Tokenizer for the language model.
"""
# Load the model
multiModel = MultimodalModel(bert_model_name, num_labels)

# Load the model weights and tokenizer directly from Hugging Face Spaces
multiModel.load_state_dict(torch.hub.load_state_dict_from_url(model_weights_path, map_location=device), strict=False)
tokenizer = AutoTokenizer.from_pretrained("netgvarun2005/MultiModalBertHubertTokenizer") 

# GenAI
tokenizer_gpt = AutoTokenizer.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedTokenizer", pad_token='<|pad|>',bos_token='<|startoftext|>',eos_token='<|endoftext|>')
model_gpt = AutoModelForCausalLM.from_pretrained("netgvarun2005/GPTTherapistDeepSpeedModel")

return multiModel,tokenizer,model_gpt,tokenizer_gpt

Model Card Authors [Varun Sharma]

Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model's library. Check the docs .

Space using netgvarun2005/MultiModalBertHubert 1