Text-to-Speech
Transformers
PyTorch
Yoruba
speecht5
text-to-audio
Inference Endpoints
Edit model card
# Load model directly
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
from huggingface_hub import hf_hub_download
import torch

processor = SpeechT5Processor.from_pretrained("imhotepai/yoruba-tts")
model = SpeechT5ForTextToSpeech.from_pretrained("imhotepai/yoruba-tts")

dir_= hf_hub_download(repo_id="imhotepai/yoruba-tts", filename="speaker_embeddings.pt")
speaker_embeddings= torch.load(dir_)

text='Báwó ni'.lower()
inputs = processor(text=text, return_tensors="pt")

vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)

# Audio in notebook
from IPython.display import Audio

Audio(speech.numpy(), rate=16000)
Downloads last month
23
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train ImhotepAI/yoruba-tts