Torchaudio_Tacotron2_kss

torchaudio Tacotron2 model, trained on kss dataset.

License

  • code: MIT License
  • pytorch_model.bin weights: CC BY-NC-SA 4.0 (license of the kss dataset)

Requirements

pip install torch torchaudio transformers phonemizer

and you have to install espeak-ng

If you are using Windows, you need to set additional environment variables. see: https://github.com/bootphon/phonemizer/issues/44

Usage

import torch
from transformers import AutoModel, AutoTokenizer

repo = "Bingsu/torchaudio_tacotron2_kss"
model = AutoModel.from_pretrained(
    repo,
    trust_remote_code=True,
    revision="589d6557e8b4bb347f49de74270541063ba9c2bc"
    )
tokenizer = AutoTokenizer.from_pretrained(repo)
model.eval()
vocoder = torch.hub.load("seungwonpark/melgan:aca59909f6dd028ec808f987b154535a7ca3400c", "melgan", trust_repo=True, pretrained=False)
url = "https://huggingface.co/Bingsu/torchaudio_tacotron2_kss/resolve/main/melgan.pt"
state_dict = torch.hub.load_state_dict_from_url(url)
vocoder.load_state_dict(state_dict)

vocoder is same as original seungwonpark/melgan, but the weights are on the cuda, so I brought them separately.

text = "๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค ํƒ€์ฝ”ํŠธ๋ก 2์ž…๋‹ˆ๋‹ค."
inp = tokenizer(text, return_tensors="pt", return_length=True, return_attention_mask=False)
with torch.inference_mode():
    out = model(**inp)
    audio = vocoder(out[0])
import IPython.display as ipd

ipd.Audio(audio[0].numpy(), rate=22050)
Downloads last month
103
Safetensors
Model size
28.3M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.