File size: 2,484 Bytes
f6c39b5 2738a8c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
---
license: cc-by-nc-sa-4.0
language:
- en
- de
- zh
- fr
- nl
- el
- it
- es
- my
- he
- sv
- fa
- tr
- ur
library_name: transformers
pipeline_tag: audio-classification
tags:
- Speech Emotion Recognition
- SER
- Transformer
- HuBERT
- PyTorch
---
# **ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets**
Authors: Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller
Fine-tuned [**HuBERT Large**](https://huggingface.co/facebook/hubert-large-ls960-ft) on EmoSet++, comprising 37 datasets, totaling 150,907 samples and spanning a cumulative duration of 119.5 hours.
The model is expecting a 3 second long raw waveform resampled to 16 kHz. The original 6 Ouput classes are combinations of low/high arousal and negative/neutral/positive
valence.
Further details are available in the corresponding [**paper**](https://arxiv.org/)
**Note**: This model is for research purpose only.
### EmoSet++ subsets used for fine-tuning the model:
| | | | | |
| :---: | :---: | :---: | :---: | :---: |
| ABC | AD | BES | CASIA | CVE |
| Crema-D | DES | DEMoS | EA-ACT | EA-BMW |
| EA-WSJ | EMO-DB | EmoFilm | EmotiW-2014 | EMOVO |
| eNTERFACE | ESD | EU-EmoSS | EU-EV | FAU Aibo |
| GEMEP | GVESS | IEMOCAP | MES | MESD |
| MELD | PPMMK | RAVDESS | SAVEE | ShEMO |
| SmartKom | SIMIS | SUSAS | SUBSECO | TESS |
| TurkishEmo | Urdu | | | |
### Usage
```python
import torch
import torch.nn as nn
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
# CONFIG and MODEL SETUP
model_name = '.../HuBERT-EmoSet++'
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960")
model = HubertForSequenceClassification.from_pretrained(model_name)
model.classifier = nn.Linear(in_features=256,out_features=6)
sampling_rate=16000
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
```
### Citation Info
```
@inproceedings{Amiriparian24-EEH,
author = {Shahin Amiriparian and Filip Packan and Maurice Gerczuk and Bj\"orn W.\ Schuller},
title = {{ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets}},
booktitle = {{Proc. INTERSPEECH}},
year = {2024},
editor = {},
volume = {},
series = {},
address = {Kos Island, Greece},
month = {September},
publisher = {ISCA},
}
``` |