Swahili female voice text-to-speech model

This is a continuous development of text-to-speech model for female voice using Swahili language

Please give it a try

for inference try the following

# import all required libraries
from transformers import VitsModel, AutoTokenizer
import torch
import numpy as np
import scipy.io.wavfile

# Load model and tokenizer
model = VitsModel.from_pretrained("mussacharles60/swahili-tts-female-voice")
tokenizer = AutoTokenizer.from_pretrained("mussacharles60/swahili-tts-female-voice")

# Running the TTS
text = "Mambo vipi ?, Hii ni Myssa Tech sauti ya A.I, kujaribishwa na Mussa Charles"
inputs = tokenizer(text, return_tensors="pt")

# Generate waveform
with torch.no_grad():
    output = model(**inputs).waveform

# Convert PyTorch tensor to NumPy array
output_np = output.squeeze().cpu().numpy()

# Write to WAV file
scipy.io.wavfile.write("female_voice_test.wav", rate=model.config.sampling_rate, data=output_np)

You're all welcome to contribute.

Thanks 🤗

Downloads last month
52
Safetensors
Model size
36.3M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mussacharles60/swahili-tts-female-voice

Base model

facebook/mms-tts
Finetuned
(2)
this model

Datasets used to train mussacharles60/swahili-tts-female-voice