UTDUSS vocoder model

In this repo, we provide model weight of the descript audio codec used for the Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge

Prerequesties

official dac library which can be installed with the following command.

pip install descript-audio-codec

Provided weights

Vocoder task

model name on paper	model name on this repo
😀	expresso_16k_2code.pth
😀 w/o hyper-parameter tuning	expresso_16k_2code_official.pth
😀 w/o data exclusion	expresso_16k_2code_wo_data.pth
😀 w/o matching sampling rate	expresso_24k_2code_ab.pth

Acoustic +Vocoder (TTS) task

Please note that the weight for acoustic model is not provided.

Full training set

model name on paper	model name on this repo
Discrete-TTS v1, v1.1	lj_16k_1code.pth
Discrete-TTS v2, v2.2	lj_16k_1code_512.pth
Discrete-TTS v3	lj_16k_1code_256.pth

1h training set

model name on paper	model name on this repo
Discrete-TTS v1, v1.1	lj_1h_16k_1code.pth
Discrete-TTS v2, v2.2	lj_1h_16k_1code_512.pth
Discrete-TTS v3	lj_1h_16k_1code_256.pth

Sample code

import dac
import torch
from pathlib import Path
model_url = "https://huggingface.co/sarulab-speech/UTDUSS-Vocoder/resolve/main/expresso_16k_2code.pth"
model_path = Path(f"/tmp/utduss/{model_url.split('/')[-1]}")
model_path.parent.mkdir(parents=True,exist_ok=True)
torch.hub.download_url_to_file(model_url,model_path)
model = dac.DAC.load(model_path)

Contributors

Wataru Nakata
Kazuki Yamauchi
Dong Yang
Hiroaki Hyodo
Yuki Saito