File size: 1,568 Bytes
4fdbf41 eaf42ad 4fdbf41 eaf42ad 4fdbf41 eaf42ad 4fdbf41 82f5887 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
library_name: transformers
tags: []
---
# Malay Parler TTS Mini V1
Finetuned https://huggingface.co/parler-tts/parler-tts-mini-v1 on Malay TTS dataset https://huggingface.co/datasets/mesolitica/tts-combine-annotated
Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/parler-tts
Wandb at https://wandb.ai/huseinzol05/parler-speech?nw=nwuserhuseinzol05
## how-to
```python
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = ParlerTTSForConditionalGeneration.from_pretrained("mesolitica/malay-parler-tts-mini-v1").to(device)
tokenizer = AutoTokenizer.from_pretrained("mesolitica/malay-parler-tts-mini-v1")
speakers = [
'Yasmin',
'Osman',
'Bunga',
'Ariff',
'Ayu',
'Kamarul',
'Danial',
'Elina',
]
prompt = 'Husein zolkepli sangat comel dan kacak suka makan cendol'
for s in speakers:
description = f"{s}'s voice, delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu()
sf.write(f'{s}.mp3', audio_arr.numpy().squeeze(), 44100)
``` |