Shona Text-to-Speech

This repository contains the Shona (sna) language text-to-speech (TTS) model checkpoint.

Model Details

Model Description

Developed by: Fastino Mateteva
Model type: Text to Speech
Language(s) (NLP): Shona
Finetuned from model: SpeechT5

Usage

pip install --upgrade transformers accelerate

Then, run inference with the following code-snippet:


# Load model directly
from transformers import AutoTokenizer, AutoModelForTextToWaveform

tokenizer = AutoTokenizer.from_pretrained("Fastino06/ff")
model = AutoModelForTextToWaveform.from_pretrained("Fastino06/ff")


text = "some example text in the Shona language"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs).waveform

The resulting waveform can be saved as a .wav file:

import scipy

scipy.io.wavfile.write("fassy.wav", rate=model.config.sampling_rate, data=output)

Or displayed in a Jupyter Notebook / Google Colab:

from IPython.display import Audio

Audio(output, rate=model.config.sampling_rate)

BibTex citation

This model was developed by Fastino Mateteva