Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.5.0
Amphion Evaluation Recipe
Supported Evaluation Metrics
Until now, Amphion Evaluation has supported the following objective metrics:
- F0 Modeling:
- F0 Pearson Coefficients (FPC)
- F0 Periodicity Root Mean Square Error (PeriodicityRMSE)
- F0 Root Mean Square Error (F0RMSE)
- Voiced/Unvoiced F1 Score (V/UV F1)
- Energy Modeling:
- Energy Root Mean Square Error (EnergyRMSE)
- Energy Pearson Coefficients (EnergyPC)
- Intelligibility:
- Spectrogram Distortion:
- Frechet Audio Distance (FAD)
- Mel Cepstral Distortion (MCD)
- Multi-Resolution STFT Distance (MSTFT)
- Perceptual Evaluation of Speech Quality (PESQ)
- Short Time Objective Intelligibility (STOI)
- Scale Invariant Signal to Distortion Ratio (SISDR)
- Scale Invariant Signal to Noise Ratio (SISNR)
- Speaker Similarity:
We provide a recipe to demonstrate how to objectively evaluate your generated audios. There are three steps in total:
- Pretrained Models Preparation
- Audio Data Preparation
- Evaluation
1. Pretrained Models Preparation
If you want to calculate RawNet3
based speaker similarity, you need to download the pretrained model first, as illustrated here.
2. Aduio Data Preparation
Prepare reference audios and generated audios in two folders, the ref_dir
contains the reference audio and the gen_dir
contains the generated audio. Here is an example.
┣ {ref_dir}
┃ ┣ sample1.wav
┃ ┣ sample2.wav
┣ {gen_dir}
┃ ┣ sample1.wav
┃ ┣ sample2.wav
You have to make sure that the pairwise reference audio and generated audio are named the same, as illustrated above (sample1 to sample1, sample2 to sample2).
3. Evaluation
Run the run.sh
with specified refenrece folder, generated folder, dump folder and metrics.
cd Amphion
sh egs/metrics/run.sh \
--reference_folder [Your path to the reference audios] \
--generated_folder [Your path to the generated audios] \
--dump_folder [Your path to dump the objective results] \
--metrics [The metrics you need] \
As for the metrics, an example is provided below:
--metrics "mcd pesq fad"
All currently available metrics keywords are listed below:
Keys | Description |
---|---|
fpc |
F0 Pearson Coefficients |
f0_periodicity_rmse |
F0 Periodicity Root Mean Square Error |
f0rmse |
F0 Root Mean Square Error |
v_uv_f1 |
Voiced/Unvoiced F1 Score |
energy_rmse |
Energy Root Mean Square Error |
energy_pc |
Energy Pearson Coefficients |
cer |
Character Error Rate |
wer |
Word Error Rate |
speaker_similarity |
Cos Similarity based on RawNet3 |
fad |
Frechet Audio Distance |
mcd |
Mel Cepstral Distortion |
mstft |
Multi-Resolution STFT Distance |
pesq |
Perceptual Evaluation of Speech Quality |
si_sdr |
Scale Invariant Signal to Distortion Ratio |
si_snr |
Scale Invariant Signal to Noise Ratio |
stoi |
Short Time Objective Intelligibility |