Nekodimos/CT_orotts
Nekodimos/CT_orotts is a fine-tuned version of the F5-TTS (Flow Matching with Diffusion Transformer) model, specifically optimized for Oromiffa (Afaan Oromo) speech synthesis.
By leveraging the non-autoregressive flow matching architecture of F5-TTS, this model aims to generate natural-sounding, realistic speech. This release incorporates a custom swapped tokenizer and has been trained on a curated local speech dataset to improve synthesis quality and pronunciation for the target language.
Key Features
- Swapped Tokenizer: The default text tokenizer has been swapped with a custom tokenizer optimized for Oromiffa orthography (Qubee). This reduces token inflation, improves token efficiency, and prevents alignment errors common with generic multilingual tokenizers.
- Targeted Training Data: Fine-tuned on approximately 80 hours of Oromiffa speech data to capture regional accents, native cadence, and accurate phonetic pronunciations.
- F5-TTS Architecture: Utilizes a Diffusion Transformer (DiT) with flow matching, bypassing the need for complex duration models or phoneme alignment steps.
- Zero-Shot Voice Cloning: Retains the core F5-TTS capability to adapt to reference voices using a short audio prompt.
| Input Text (Amharic) | Generated Speech |
|---|---|
| Sample 1: "Lafti ganna kana naamusaan, reejistariidhaan, tiraaktaraan akka kootamu, Hogganaan Biiroo Qonnaa Oromiyaa" |
Technical Specifications
- Base Model: F5-TTS (Flow-Matching-based)
- Hardware Used: NVIDIA A100 GPU
- Training Duration: ~15 Hours
- Dataset Size: ~80 Hours of speech data
- Repository:
Nekodimos/CT_orotts
Setup & Inference
To run inference with this model, you will need to clone the official F5-TTS repository and load this specific checkpoint alongside its custom tokenizer.
1. Installation
Clone the repository and install the dependencies:
git clone ...
cd F5-TTS
pip install -e .
2. Basic Usage
When running inference, ensure you point the script to the custom tokenizer and model weights associated with the Nekodimos/CT_orotts repository.
# Example initialization structure (adjust paths as necessary)
from f5_tts.model import DiT
from f5_tts.infer.utils_infer import load_checkpoint
# Load your custom tokenizer and the fine-tuned model checkpoint
# model = load_checkpoint(DiT, checkpoint_path="path/to/CT_orotts_checkpoint.pt")
Observations & Limitations
- Tokenizer Dependency: It is essential to use the swapped tokenizer provided in this repository. Using the original F5-TTS default tokenizer will result in character misalignments and degradation of speech quality.
- Reference Audio: As with standard F5-TTS, the quality of zero-shot voice cloning depends heavily on the clarity, noise levels, and language matching of the 3-to-10-second reference audio prompt.
- Scope: While the model is capable of zero-shot generation in other languages, its training distribution is highly focused on Oromiffa.
Credits & Acknowledgments
- Base Architecture: The F5-TTS team for their work on "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching".
- Fine-Tuning & Adaptation: Developed and trained by
Nekodimos.