Update README.md
Browse files
README.md
CHANGED
@@ -8,17 +8,174 @@ tags:
|
|
8 |
---
|
9 |
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
-
This is a demo for Music2Emo: Towards Unified Music Emotion Recognition across Dimensional and Categorical Models.
|
14 |
|
15 |
-
## Description
|
16 |
-
Upload an audio file to analyze its emotional characteristics. The model predicts:
|
17 |
-
- Mood tags
|
18 |
-
- Valence score (1-9 scale)
|
19 |
-
- Arousal score (1-9 scale)
|
20 |
|
21 |
-
## Usage
|
22 |
-
1. Upload an audio file (MP3 or WAV)
|
23 |
-
2. Adjust the mood detection threshold if needed
|
24 |
-
3. Click "Analyze Emotions" to get results
|
|
|
8 |
---
|
9 |
|
10 |
|
11 |
+
<div align="center">
|
12 |
+
|
13 |
+
# Music2Emo: Towards Unified Music Emotion Recognition across Dimensional and Categorical Models
|
14 |
+
|
15 |
+
[](https://huggingface.co/spaces/amaai-lab/music2emo) [](https://arxiv.org/abs/2502.03979)
|
16 |
+
|
17 |
+
</div>
|
18 |
+
|
19 |
+
This repository contains the code accompanying the paper "Towards Unified Music Emotion Recognition across Dimensional and Categorical Models" by Dr. Jaeyong Kang and Prof. Dorien Herremans.
|
20 |
+
|
21 |
+
π₯ Live demo available on [HuggingFace](https://huggingface.co/spaces/amaai-lab/music2emo)
|
22 |
+
|
23 |
+
<div align="center">
|
24 |
+
<img src="m2e.png" width="300"/>
|
25 |
+
</div>
|
26 |
+
|
27 |
+
## Introduction
|
28 |
+
|
29 |
+
We present a unified multitask learning framework for Music Emotion Recognition (MER) that integrates categorical and dimensional emotion labels, enabling training across multiple datasets. Our approach combines musical features (key and chords) with MERT embeddings and employs knowledge distillation to enhance generalization. Evaluated on MTG-Jamendo, DEAM, PMEmo, and EmoMusic, our model outperforms state-of-the-art methods, including the best-performing model from the MediaEval 2021 competition.
|
30 |
+
|
31 |
+

|
32 |
+
|
33 |
+
|
34 |
+
## Change Log
|
35 |
+
|
36 |
+
- 2025-02-10: Released Music2Emo v1.0, featuring both categorical and VA emotion prediction from music.
|
37 |
+
|
38 |
+
## Quickstart Guide
|
39 |
+
|
40 |
+
|
41 |
+
Predict emotion from audio:
|
42 |
+
|
43 |
+
```python
|
44 |
+
from music2emo import Music2emo
|
45 |
+
|
46 |
+
input_audio = "inference/input/test.mp3"
|
47 |
+
|
48 |
+
music2emo = Music2emo()
|
49 |
+
output_dic = music2emo.predict(input_audio)
|
50 |
+
|
51 |
+
valence = output_dic["valence"]
|
52 |
+
arousal = output_dic["arousal"]
|
53 |
+
predicted_moods =output_dic["predicted_moods"]
|
54 |
+
|
55 |
+
print("\nπ΅ **Music Emotion Recognition Results** π΅")
|
56 |
+
print("-" * 50)
|
57 |
+
print(f"π **Predicted Mood Tags:** {', '.join(predicted_moods) if predicted_moods else 'None'}")
|
58 |
+
print(f"π **Valence:** {valence:.2f} (Scale: 1-9)")
|
59 |
+
print(f"β‘ **Arousal:** {arousal:.2f} (Scale: 1-9)")
|
60 |
+
print("-" * 50)
|
61 |
+
|
62 |
+
```
|
63 |
+
|
64 |
+
## Installation
|
65 |
+
This repo is developed using python version 3.10
|
66 |
+
|
67 |
+
```bash
|
68 |
+
git clone https://github.com/AMAAI-Lab/Music2Emotion
|
69 |
+
cd Music2Emotion
|
70 |
+
pip install -r requirements.txt
|
71 |
+
```
|
72 |
+
|
73 |
+
* Our code is built on pytorch version 2.3.1 (torch==2.3.1 in the requirements.txt). But you might need to choose the correct version of `torch` based on your CUDA version
|
74 |
+
|
75 |
+
## Dataset
|
76 |
+
|
77 |
+
Download the following datasets:
|
78 |
+
- MTG-Jamendo [(Link)](https://github.com/MTG/mtg-jamendo-dataset)
|
79 |
+
- PMEmo [(Link)](https://drive.google.com/drive/folders/1qDk6hZDGVlVXgckjLq9LvXLZ9EgK9gw0)
|
80 |
+
- DEAM [(Link)](https://cvml.unige.ch/databases/DEAM/)
|
81 |
+
- EmoMusic [(Link)](https://cvml.unige.ch/databases/emoMusic/)
|
82 |
+
|
83 |
+
After downloading, place all .mp3 files into the following directory structure:
|
84 |
+
|
85 |
+
```
|
86 |
+
dataset/
|
87 |
+
βββ jamendo/
|
88 |
+
β βββ mp3/**/*.mp3 # MTG-Jamendo audio files (nested structure)
|
89 |
+
βββ pmemo/
|
90 |
+
β βββ mp3/*.mp3 # PMEmo audio files
|
91 |
+
βββ deam/
|
92 |
+
β βββ mp3/*.mp3 # DEAM audio files
|
93 |
+
βββ emomusic/
|
94 |
+
βββ mp3/*.mp3 # EmoMusic audio files
|
95 |
+
```
|
96 |
+
|
97 |
+
## Directory Structure
|
98 |
+
|
99 |
+
* `config/`: Configuration files
|
100 |
+
* `dataset/`: Dataset directories
|
101 |
+
* `dataset_loader/`: Dataset loading utilities
|
102 |
+
* `utils/`: Other utilities
|
103 |
+
* `model/`
|
104 |
+
* `linear.py`: Fully connected (FC) layer with MERT features
|
105 |
+
* `linear_attn_ck.py`: FC layer with MERT and musical features (chord/key)
|
106 |
+
* `linear_mt_attn_ck.py`: Multitask FC layer with MERT and musical features (chord/key)
|
107 |
+
* `preprocess/`
|
108 |
+
* `feature_extractor.py`: MERT feature extraction
|
109 |
+
* `saved_models/`: Saved model weight files
|
110 |
+
* `data_loader.py`: Data loading script
|
111 |
+
* `train.py`: Training script
|
112 |
+
* `test.py`: Testing script
|
113 |
+
* `trainer.py`: Training pipeline script
|
114 |
+
* `inference.py`: Inference script
|
115 |
+
* `music2emo.py`: Video2Music module that outputs emotion from input audio
|
116 |
+
* `demo.ipynb`: Jupyter notebook for Quickstart Guide
|
117 |
+
|
118 |
+
## Training
|
119 |
+
|
120 |
+
```shell
|
121 |
+
python train.py
|
122 |
+
```
|
123 |
+
|
124 |
+
## Test
|
125 |
+
|
126 |
+
```shell
|
127 |
+
python test.py
|
128 |
+
```
|
129 |
+
|
130 |
+
## Evaluation
|
131 |
+
|
132 |
+
### Comparison of performance metrics when training on multiple datasets.
|
133 |
+
|
134 |
+
| **Training datasets** | **MTG-Jamendo (J.)** | **DEAM (D.)** | **EmoMusic (E.)** | **PMEmo (P.)** |
|
135 |
+
|---------------------------|:-------------------:|:--------------:|:-----------------:|:---------------:|
|
136 |
+
| | PR-AUC / ROC-AUC | RΒ² V / RΒ² A | RΒ² V / RΒ² A | RΒ² V / RΒ² A |
|
137 |
+
| **Single dataset (X)** | 0.1521 / 0.7806 | 0.5131 / 0.6025| 0.5957 / 0.7489 | 0.5360 / 0.7772 |
|
138 |
+
| **J + D** | 0.1526 / 0.7806 | 0.5144 / 0.6046| - | - |
|
139 |
+
| **J + E** | 0.1540 / 0.7809 | - | 0.6091 / 0.7525 | - |
|
140 |
+
| **J + P** | 0.1522 / 0.7806 | - | - | 0.5401 / 0.7780 |
|
141 |
+
| **J + D + E + P** | **0.1543 / 0.7810** | **0.5184 / 0.6228** | **0.6512 / 0.7616** | **0.5473 / 0.7940** |
|
142 |
+
|
143 |
+
|
144 |
+
### Comparison of our proposed model with existing models on MTG-Jamendo dataset.
|
145 |
+
|
146 |
+
| **Model** | **PR-AUC** β | **ROC-AUC** β |
|
147 |
+
|--------------------|:-----------:|:----------:|
|
148 |
+
| lileonardo | 0.1508 | 0.7747 |
|
149 |
+
| SELAB-HCMUS | 0.1435 | 0.7599 |
|
150 |
+
| Mirable | 0.1356 | 0.7687 |
|
151 |
+
| UIBK-DBIS | 0.1087 | 0.7046 |
|
152 |
+
| Hasumi et al. | 0.0730 | 0.7750 |
|
153 |
+
| Greer et al. | 0.1082 | 0.7354 |
|
154 |
+
| MERT-95M | 0.1340 | 0.7640 |
|
155 |
+
| MERT-330M | 0.1400 | 0.7650 |
|
156 |
+
| **Proposed (Ours)** | **0.1543** | **0.7810** |
|
157 |
+
|
158 |
+
## TODO
|
159 |
+
|
160 |
+
- [ ] Incorporate additional features, such as lyrics.
|
161 |
+
|
162 |
+
## Citation
|
163 |
+
|
164 |
+
If you find this resource useful, [please cite the original work](https://doi.org/10.48550/arXiv.2502.03979):
|
165 |
+
|
166 |
+
```bibtex
|
167 |
+
@misc{kang2025unifiedmusicemotionrecognition,
|
168 |
+
title={Towards Unified Music Emotion Recognition across Dimensional and Categorical Models},
|
169 |
+
author={Jaeyong Kang and Dorien Herremans},
|
170 |
+
year={2025},
|
171 |
+
eprint={2502.03979},
|
172 |
+
archivePrefix={arXiv},
|
173 |
+
primaryClass={cs.SD},
|
174 |
+
url={https://arxiv.org/abs/2502.03979},
|
175 |
+
}
|
176 |
+
```
|
177 |
+
|
178 |
+
Kang, J. & Herremans, D. (2025). Towards Unified Music Emotion Recognition across Dimensional and Categorical Models, arXiv.
|
179 |
|
|
|
180 |
|
|
|
|
|
|
|
|
|
|
|
181 |
|
|
|
|
|
|
|
|