You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Nekodimos/CT_orotts

Nekodimos/CT_orotts is a fine-tuned version of the F5-TTS (Flow Matching with Diffusion Transformer) model, specifically optimized for Oromiffa (Afaan Oromo) speech synthesis.

By leveraging the non-autoregressive flow matching architecture of F5-TTS, this model aims to generate natural-sounding, realistic speech. This release incorporates a custom swapped tokenizer and has been trained on a curated local speech dataset to improve synthesis quality and pronunciation for the target language.

Key Features

Swapped Tokenizer: The default text tokenizer has been swapped with a custom tokenizer optimized for Oromiffa orthography (Qubee). This reduces token inflation, improves token efficiency, and prevents alignment errors common with generic multilingual tokenizers.
Targeted Training Data: Fine-tuned on approximately 80 hours of Oromiffa speech data to capture regional accents, native cadence, and accurate phonetic pronunciations.
F5-TTS Architecture: Utilizes a Diffusion Transformer (DiT) with flow matching, bypassing the need for complex duration models or phoneme alignment steps.
Zero-Shot Voice Cloning: Retains the core F5-TTS capability to adapt to reference voices using a short audio prompt.

Input Text (Amharic)	Generated Speech
Sample 1: "Lafti ganna kana naamusaan, reejistariidhaan, tiraaktaraan akka kootamu, Hogganaan Biiroo Qonnaa Oromiyaa"

Technical Specifications

Base Model: F5-TTS (Flow-Matching-based)
Hardware Used: NVIDIA A100 GPU
Training Duration: ~15 Hours
Dataset Size: ~80 Hours of speech data
Repository: Nekodimos/CT_orotts

Setup & Inference

To run inference with this model, you will need to clone the official F5-TTS repository and load this specific checkpoint alongside its custom tokenizer.

1. Installation

Clone the repository and install the dependencies:

git clone ...
cd F5-TTS
pip install -e .

2. Basic Usage

When running inference, ensure you point the script to the custom tokenizer and model weights associated with the Nekodimos/CT_orotts repository.

# Example initialization structure (adjust paths as necessary)
from f5_tts.model import DiT
from f5_tts.infer.utils_infer import load_checkpoint

# Load your custom tokenizer and the fine-tuned model checkpoint
# model = load_checkpoint(DiT, checkpoint_path="path/to/CT_orotts_checkpoint.pt")

Observations & Limitations

Tokenizer Dependency: It is essential to use the swapped tokenizer provided in this repository. Using the original F5-TTS default tokenizer will result in character misalignments and degradation of speech quality.
Reference Audio: As with standard F5-TTS, the quality of zero-shot voice cloning depends heavily on the clarity, noise levels, and language matching of the 3-to-10-second reference audio prompt.
Scope: While the model is capable of zero-shot generation in other languages, its training distribution is highly focused on Oromiffa.

Credits & Acknowledgments

Base Architecture: The F5-TTS team for their work on "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching".
Fine-Tuning & Adaptation: Developed and trained by Nekodimos.

Downloads last month: -; Downloads are not tracked for this model. How to track