Text-to-Speech
Ewe
coqui-tts
tts
vits
open-bible
ewe

VITS Open Bible — Ewe

A multispeaker text-to-speech model for Ewe, trained from scratch on the Open Bible corpus using the VITS architecture (end-to-end TTS with adversarial learning, 22,050 Hz output) via the Coqui TTS framework.

Unlike zero-shot TTS models, VITS is conditioned on speaker embeddings learned during training. A speaker name from the training set must be supplied at inference time.

Files

File Purpose
model_last.pth Trained model weights.
config.json Coqui TTS model configuration.
speakers.pth Speaker ID → embedding mapping.

Intended use

  • Multispeaker TTS for Ewe using one of the training-set speaker voices.
  • Research on multilingual TTS, low-resource TTS evaluation, and listening studies on Open Bible–style read-speech.

How to use

Install Coqui TTS:

pip install TTS

Download the checkpoint and run inference:

import torch
from huggingface_hub import hf_hub_download
from TTS.tts.utils.speakers import SpeakerManager
from TTS.utils.synthesizer import Synthesizer

repo_id  = "multilingual-tts/VITS-OpenBible-Ewe"
ckpt     = hf_hub_download(repo_id, "model_last.pth")
config   = hf_hub_download(repo_id, "config.json")
speakers = hf_hub_download(repo_id, "speakers.pth")

use_cuda = torch.cuda.is_available()
synthesizer = Synthesizer(
    tts_checkpoint=ckpt,
    tts_config_path=config,
    tts_speakers_file=speakers,
    use_cuda=use_cuda,
)

# Coqui's Synthesizer may not inject the speakers file into the model config
# automatically — restore the SpeakerManager manually when needed.
if synthesizer.tts_model.speaker_manager is None:
    synthesizer.tts_model.speaker_manager = SpeakerManager(
        speaker_id_file_path=speakers
    )

# List available speaker names
print(sorted(synthesizer.tts_model.speaker_manager.speaker_names))

wav = synthesizer.tts(
    text="...",          # text to synthesise in Ewe
    speaker_name="...",  # one of the speaker names printed above
    split_sentences=True,
)

Training data

  • Source: davidguzmanr/open-bible-resources, config Ewe
  • Size: approximately 22,195 utterances
  • Speakers: multispeaker; speaker identity is fixed to one of the training-set voices and selected by name at inference time
  • Sample rate: 22,050 Hz

Training procedure

  • Architecture: VITS (Conditional Variational Autoencoder + adversarial training).
  • Grapheme-level tokenizer, built from the training transcripts.
  • Optimizer: AdamW, learning rate 2e-4.
  • Training budget: 500,000 optimizer updates on 2 GPUs with mixed precision (bf16).

Audio preprocessing and training are reproducible via the upstream open-bible-models repo.

Evaluation

Evaluated alongside other Open-Bible TTS systems on character/word error rate (via Meta's Omnilingual ASR) and UTMOSv2 naturalness scores. See the open-bible-models repository for the evaluation pipeline and the open-bible-surveys repository for the human-listening survey methodology.

Downloads last month
67
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train multilingual-tts/VITS-OpenBible-Ewe

Collection including multilingual-tts/VITS-OpenBible-Ewe

Paper for multilingual-tts/VITS-OpenBible-Ewe