emotion2vec_base / README.md
BoJack's picture
Upload 5 files
18a317b verified
|
raw
history blame
3.85 kB
metadata
license: other
license_name: model-license
license_link: https://github.com/alibaba-damo-academy/FunASR
frameworks:
  - Pytorch
tasks:
  - emotion-recognition

EMOTION2VEC

emotion2vec: universal speech emotion representation model
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Guides

emotion2vec is the first universal speech emotion representation model. Through self-supervised pre-training, emotion2vec has the ability to extract emotion representation across different tasks, languages, and scenarios.

The version is an pre-trained representation model without fine-tuning, which can be used for feature extraction.

Model Card

GitHub Repo: emotion2vec

Model ⭐Model Scope 🤗Hugging Face Fine-tuning Data (Hours)
emotion2vec Link Link /
emotion2vec+ seed Link Link 201
emotion2vec+ base Link Link 4788
emotion2vec+ large Link Link 42526

Installation

pip install -U funasr modelscope

Usage

input: 16k Hz speech recording

granularity:

  • "utterance": Extract features from the entire utterance
  • "frame": Extract frame-level features (50 Hz)

extract_embedding: Whether to extract features

Inference based on ModelScope

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

inference_pipeline = pipeline(
    task=Tasks.emotion_recognition,
    model="iic/emotion2vec_base")

rec_result = inference_pipeline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav', output_dir="./outputs", granularity="utterance", extract_embedding=True)
print(rec_result)

Inference based on FunASR

from funasr import AutoModel

model = AutoModel(model="iic/emotion2vec_base")

res = model(input='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav', output_dir="./outputs", granularity="utterance", extract_embedding=True)
print(res)

Note: The model will automatically download.

Supports input file list, wav.scp (Kaldi style):

wav_name1 wav_path1.wav
wav_name2 wav_path2.wav
...

Outputs are emotion representation, saved in the output_dir in numpy format (can be loaded with np.load())

Note

This repository is the Huggingface version of emotion2vec, with identical model parameters as the original model and Model Scope version.

Original repository: https://github.com/ddlBoJack/emotion2vec

Model Scope repository: https://github.com/alibaba-damo-academy/FunASR

Hugging Face repository: https://huggingface.co/emotion2vec

Citation

@article{ma2023emotion2vec,
  title={emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation},
  author={Ma, Ziyang and Zheng, Zhisheng and Ye, Jiaxin and Li, Jinchao and Gao, Zhifu and Zhang, Shiliang and Chen, Xie},
  journal={arXiv preprint arXiv:2312.15185},
  year={2023}
}