Contextual Emotion Recognition with LLaVA

This repository contains the checkpoints of fine-tuned LLaVA for the contextual emotion recognition task as described in our paper. For more information about LLaVA, visit LLaVA Official Website.

Fine-tuning

We finetuned LLaVA on the contextual emotion recognition task using the EMOTIC EMOTIC Dataset datasets:

EMOTIC Train Set:
- Precision: 54.27
- F1 Score: 22.73
EMOTIC Validation Set with Augmentation:
- F1 Score: 36.83
- Precision: 38.71

We provided the LoRA weights for our fine-tuned models. The base model used for fine-tuning is llava-v1.5-13b which is available on the Hugging Face Model Hub. You can find the model here.

Usage

To perform contextual emotion recognition using our fine-tuned model, follow these steps:

Prepare your input:
- An image with a bounding box of the target individual.
- Text prompt: From suffering, pain, aversion, disapproval, anger, fear, annoyance, fatigue, disquietment, doubt/confusion, embarrassment, disconnection, affection, confidence, engagement, happiness, peace, pleasure, esteem, excitement, anticipation, yearning, sensitivity, surprise, sadness, and sympathy, pick the top labels that the person in the red bounding box is feeling at the same time.
Run LLaVA using our provided LoRA weights and the base model.
Receive the output which includes the emotion labels that the target individual is feeling.