|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# Malay Parler TTS Mini V1 |
|
|
|
Finetuned https://huggingface.co/parler-tts/parler-tts-mini-v1 on Malay TTS dataset https://huggingface.co/datasets/mesolitica/tts-combine-annotated |
|
|
|
Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/parler-tts |
|
|
|
Wandb at https://wandb.ai/huseinzol05/parler-speech?nw=nwuserhuseinzol05 |
|
|
|
## how-to |
|
|
|
```python |
|
import torch |
|
from parler_tts import ParlerTTSForConditionalGeneration |
|
from transformers import AutoTokenizer |
|
import soundfile as sf |
|
|
|
device = "cuda:0" if torch.cuda.is_available() else "cpu" |
|
|
|
model = ParlerTTSForConditionalGeneration.from_pretrained("mesolitica/malay-parler-tts-mini-v1").to(device) |
|
tokenizer = AutoTokenizer.from_pretrained("mesolitica/malay-parler-tts-mini-v1") |
|
|
|
speakers = [ |
|
'Yasmin', |
|
'Osman', |
|
'Bunga', |
|
'Ariff', |
|
'Ayu', |
|
'Kamarul', |
|
'Danial', |
|
'Elina', |
|
] |
|
|
|
prompt = 'Husein zolkepli sangat comel dan kacak suka makan cendol' |
|
|
|
for s in speakers: |
|
description = f"{s}'s voice, delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up." |
|
|
|
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device) |
|
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device) |
|
|
|
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids) |
|
audio_arr = generation.cpu() |
|
sf.write(f'{s}.mp3', audio_arr.numpy().squeeze(), 44100) |
|
``` |