YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Shona MMS TTS Fine-Tune
This is a Shona fine-tune of Meta’s MMS TTS model, based on the original facebook/mms-tts-sna checkpoint.
The model is publicly available here:
https://huggingface.co/manassehzw/sna-tts-v1
It is lightweight and can run locally on most machines, including CPU-only setups.
Requirements
You’ll need:
- Python 3.10+
torchtransformerssoundfilefastapiuvicorn
Install dependencies:
pip install torch transformers soundfile fastapi uvicorn
Quick Start: Local Inference
Create a Python script called run_tts.py:
import torch
import soundfile as sf
from transformers import AutoTokenizer, VitsModel
MODEL_ID = "manassehzw/sna-tts-v1"
text = "Mangwanani. Ndamuka zvakanaka nhasi."
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = VitsModel.from_pretrained(MODEL_ID)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()
inputs = tokenizer(text, return_tensors="pt").to(device)
with torch.no_grad():
output = model(**inputs).waveform
waveform = output.squeeze().cpu().numpy()
sf.write("shona_tts.wav", waveform, model.config.sampling_rate)
print("Saved audio to shona_tts.wav")
Run it:
python run_tts.py
This will generate:
shona_tts.wav
Using the Model in a FastAPI Endpoint
You can also wrap the model in a small FastAPI service and expose it as an HTTP API.
Create a file called api.py:
import io
import torch
import soundfile as sf
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from transformers import AutoTokenizer, VitsModel
MODEL_ID = "manassehzw/sna-tts-v1"
app = FastAPI(title="Shona MMS TTS API")
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = VitsModel.from_pretrained(MODEL_ID)
model = model.to(device)
model.eval()
class TTSRequest(BaseModel):
text: str
@app.post("/tts")
def generate_speech(request: TTSRequest):
inputs = tokenizer(request.text, return_tensors="pt").to(device)
with torch.no_grad():
waveform = model(**inputs).waveform
audio = waveform.squeeze().cpu().numpy()
buffer = io.BytesIO()
sf.write(buffer, audio, model.config.sampling_rate, format="WAV")
buffer.seek(0)
return StreamingResponse(
buffer,
media_type="audio/wav",
headers={
"Content-Disposition": "attachment; filename=shona_tts.wav"
},
)
Run the API locally:
uvicorn api:app --host 0.0.0.0 --port 8000
Then send text to the API:
curl -X POST "http://localhost:8000/tts" \
-H "Content-Type: application/json" \
-d '{"text": "Mangwanani. Ndamuka zvakanaka nhasi."}' \
--output shona_tts.wav
This will save the generated speech as:
shona_tts.wav
You can also open the interactive FastAPI docs in your browser:
http://localhost:8000/docs
Notes
- Works best with full Shona sentences rather than short fragments.
- CPU inference is supported, but GPU inference will be faster.
- The model is lightweight, so short sentence generation should be quick locally.
- Load the model once when your API starts, not inside every request.
- For production use, put the API behind your normal authentication, rate limiting, and monitoring setup.
- Downloads last month
- 5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support