English
music
emotion
kjysmu commited on
Commit
2c07aac
Β·
verified Β·
1 Parent(s): e51d8d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +168 -11
README.md CHANGED
@@ -8,17 +8,174 @@ tags:
8
  ---
9
 
10
 
11
- # Music2Emo Demo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- This is a demo for Music2Emo: Towards Unified Music Emotion Recognition across Dimensional and Categorical Models.
14
 
15
- ## Description
16
- Upload an audio file to analyze its emotional characteristics. The model predicts:
17
- - Mood tags
18
- - Valence score (1-9 scale)
19
- - Arousal score (1-9 scale)
20
 
21
- ## Usage
22
- 1. Upload an audio file (MP3 or WAV)
23
- 2. Adjust the mood detection threshold if needed
24
- 3. Click "Analyze Emotions" to get results
 
8
  ---
9
 
10
 
11
+ <div align="center">
12
+
13
+ # Music2Emo: Towards Unified Music Emotion Recognition across Dimensional and Categorical Models
14
+
15
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/amaai-lab/music2emo) [![arXiv](https://img.shields.io/badge/arXiv-2311.00968-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2502.03979)
16
+
17
+ </div>
18
+
19
+ This repository contains the code accompanying the paper "Towards Unified Music Emotion Recognition across Dimensional and Categorical Models" by Dr. Jaeyong Kang and Prof. Dorien Herremans.
20
+
21
+ πŸ”₯ Live demo available on [HuggingFace](https://huggingface.co/spaces/amaai-lab/music2emo)
22
+
23
+ <div align="center">
24
+ <img src="m2e.png" width="300"/>
25
+ </div>
26
+
27
+ ## Introduction
28
+
29
+ We present a unified multitask learning framework for Music Emotion Recognition (MER) that integrates categorical and dimensional emotion labels, enabling training across multiple datasets. Our approach combines musical features (key and chords) with MERT embeddings and employs knowledge distillation to enhance generalization. Evaluated on MTG-Jamendo, DEAM, PMEmo, and EmoMusic, our model outperforms state-of-the-art methods, including the best-performing model from the MediaEval 2021 competition.
30
+
31
+ ![](framework.png)
32
+
33
+
34
+ ## Change Log
35
+
36
+ - 2025-02-10: Released Music2Emo v1.0, featuring both categorical and VA emotion prediction from music.
37
+
38
+ ## Quickstart Guide
39
+
40
+
41
+ Predict emotion from audio:
42
+
43
+ ```python
44
+ from music2emo import Music2emo
45
+
46
+ input_audio = "inference/input/test.mp3"
47
+
48
+ music2emo = Music2emo()
49
+ output_dic = music2emo.predict(input_audio)
50
+
51
+ valence = output_dic["valence"]
52
+ arousal = output_dic["arousal"]
53
+ predicted_moods =output_dic["predicted_moods"]
54
+
55
+ print("\n🎡 **Music Emotion Recognition Results** 🎡")
56
+ print("-" * 50)
57
+ print(f"🎭 **Predicted Mood Tags:** {', '.join(predicted_moods) if predicted_moods else 'None'}")
58
+ print(f"πŸ’– **Valence:** {valence:.2f} (Scale: 1-9)")
59
+ print(f"⚑ **Arousal:** {arousal:.2f} (Scale: 1-9)")
60
+ print("-" * 50)
61
+
62
+ ```
63
+
64
+ ## Installation
65
+ This repo is developed using python version 3.10
66
+
67
+ ```bash
68
+ git clone https://github.com/AMAAI-Lab/Music2Emotion
69
+ cd Music2Emotion
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ * Our code is built on pytorch version 2.3.1 (torch==2.3.1 in the requirements.txt). But you might need to choose the correct version of `torch` based on your CUDA version
74
+
75
+ ## Dataset
76
+
77
+ Download the following datasets:
78
+ - MTG-Jamendo [(Link)](https://github.com/MTG/mtg-jamendo-dataset)
79
+ - PMEmo [(Link)](https://drive.google.com/drive/folders/1qDk6hZDGVlVXgckjLq9LvXLZ9EgK9gw0)
80
+ - DEAM [(Link)](https://cvml.unige.ch/databases/DEAM/)
81
+ - EmoMusic [(Link)](https://cvml.unige.ch/databases/emoMusic/)
82
+
83
+ After downloading, place all .mp3 files into the following directory structure:
84
+
85
+ ```
86
+ dataset/
87
+ β”œβ”€β”€ jamendo/
88
+ β”‚ └── mp3/**/*.mp3 # MTG-Jamendo audio files (nested structure)
89
+ β”œβ”€β”€ pmemo/
90
+ β”‚ └── mp3/*.mp3 # PMEmo audio files
91
+ β”œβ”€β”€ deam/
92
+ β”‚ └── mp3/*.mp3 # DEAM audio files
93
+ └── emomusic/
94
+ └── mp3/*.mp3 # EmoMusic audio files
95
+ ```
96
+
97
+ ## Directory Structure
98
+
99
+ * `config/`: Configuration files
100
+ * `dataset/`: Dataset directories
101
+ * `dataset_loader/`: Dataset loading utilities
102
+ * `utils/`: Other utilities
103
+ * `model/`
104
+ * `linear.py`: Fully connected (FC) layer with MERT features
105
+ * `linear_attn_ck.py`: FC layer with MERT and musical features (chord/key)
106
+ * `linear_mt_attn_ck.py`: Multitask FC layer with MERT and musical features (chord/key)
107
+ * `preprocess/`
108
+ * `feature_extractor.py`: MERT feature extraction
109
+ * `saved_models/`: Saved model weight files
110
+ * `data_loader.py`: Data loading script
111
+ * `train.py`: Training script
112
+ * `test.py`: Testing script
113
+ * `trainer.py`: Training pipeline script
114
+ * `inference.py`: Inference script
115
+ * `music2emo.py`: Video2Music module that outputs emotion from input audio
116
+ * `demo.ipynb`: Jupyter notebook for Quickstart Guide
117
+
118
+ ## Training
119
+
120
+ ```shell
121
+ python train.py
122
+ ```
123
+
124
+ ## Test
125
+
126
+ ```shell
127
+ python test.py
128
+ ```
129
+
130
+ ## Evaluation
131
+
132
+ ### Comparison of performance metrics when training on multiple datasets.
133
+
134
+ | **Training datasets** | **MTG-Jamendo (J.)** | **DEAM (D.)** | **EmoMusic (E.)** | **PMEmo (P.)** |
135
+ |---------------------------|:-------------------:|:--------------:|:-----------------:|:---------------:|
136
+ | | PR-AUC / ROC-AUC | RΒ² V / RΒ² A | RΒ² V / RΒ² A | RΒ² V / RΒ² A |
137
+ | **Single dataset (X)** | 0.1521 / 0.7806 | 0.5131 / 0.6025| 0.5957 / 0.7489 | 0.5360 / 0.7772 |
138
+ | **J + D** | 0.1526 / 0.7806 | 0.5144 / 0.6046| - | - |
139
+ | **J + E** | 0.1540 / 0.7809 | - | 0.6091 / 0.7525 | - |
140
+ | **J + P** | 0.1522 / 0.7806 | - | - | 0.5401 / 0.7780 |
141
+ | **J + D + E + P** | **0.1543 / 0.7810** | **0.5184 / 0.6228** | **0.6512 / 0.7616** | **0.5473 / 0.7940** |
142
+
143
+
144
+ ### Comparison of our proposed model with existing models on MTG-Jamendo dataset.
145
+
146
+ | **Model** | **PR-AUC** ↑ | **ROC-AUC** ↑ |
147
+ |--------------------|:-----------:|:----------:|
148
+ | lileonardo | 0.1508 | 0.7747 |
149
+ | SELAB-HCMUS | 0.1435 | 0.7599 |
150
+ | Mirable | 0.1356 | 0.7687 |
151
+ | UIBK-DBIS | 0.1087 | 0.7046 |
152
+ | Hasumi et al. | 0.0730 | 0.7750 |
153
+ | Greer et al. | 0.1082 | 0.7354 |
154
+ | MERT-95M | 0.1340 | 0.7640 |
155
+ | MERT-330M | 0.1400 | 0.7650 |
156
+ | **Proposed (Ours)** | **0.1543** | **0.7810** |
157
+
158
+ ## TODO
159
+
160
+ - [ ] Incorporate additional features, such as lyrics.
161
+
162
+ ## Citation
163
+
164
+ If you find this resource useful, [please cite the original work](https://doi.org/10.48550/arXiv.2502.03979):
165
+
166
+ ```bibtex
167
+ @misc{kang2025unifiedmusicemotionrecognition,
168
+ title={Towards Unified Music Emotion Recognition across Dimensional and Categorical Models},
169
+ author={Jaeyong Kang and Dorien Herremans},
170
+ year={2025},
171
+ eprint={2502.03979},
172
+ archivePrefix={arXiv},
173
+ primaryClass={cs.SD},
174
+ url={https://arxiv.org/abs/2502.03979},
175
+ }
176
+ ```
177
+
178
+ Kang, J. & Herremans, D. (2025). Towards Unified Music Emotion Recognition across Dimensional and Categorical Models, arXiv.
179
 
 
180
 
 
 
 
 
 
181