You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Nekodimos/ZPv2_amtts

Nekodimos/ZPv2_amtts is a lightweight, high-performance Amharic (አማርኛ) text-to-speech model. Unlike models fine-tuned from massive pre-trained checkpoints, this model was trained from scratch with customized modifications to prioritize inference speed, lower computational footprint, and high-fidelity audio output.

By integrating a swapped 48kHz decoder and optimizing the underlying architecture, ZPv2_amtts aims to deliver competitive, natural-sounding speech synthesis while remaining efficient enough to run on edge and resource-constrained devices.

Key Features

Trained From Scratch: Completely initialized and trained on custom Amharic data, ensuring that the model's acoustic representations are fundamentally aligned to the nuances of the language without pre-existing biases from other languages.
Upgraded 48kHz Decoder: Incorporates a swapped 48kHz neural vocoder/decoder, allowing the model to synthesize high-resolution, crisp audio compared to standard 24kHz setups.
Edge-Optimized & Lightweight: Designed with architectural modifications to reduce parameter size and latency. The model is lightweight enough to be deployed on consumer edge devices, mobile platforms, or localized servers.
Competitive Audio Quality: Despite its smaller size and faster inference times, the model maintains a high standard of intelligibility and natural cadence in Amharic.

Input Text (Amharic)	Generated Speech
Sample 1: "ሴቶችም ወንዶችም ህፃናት በእኩል ደረጃ ለአእምሮ እድገት መዛባት ኦቲዝም ሊጋለጡ እንደሚችሉ ጥናቶች ይጠቁማሉ።"
Sample 2: "ሴቶችም ወንዶችም ህፃናት በእኩል ደረጃ ለአእምሮ እድገት መዛባት ኦቲዝም ሊጋለጡ እንደሚችሉ ጥናቶች ይጠቁማሉ።"
Sample 3: "ሱዌዝ ካናል ቀይ ባህርን ከሜዲትራኒያን ጋር ያገናኛል። በእስያና በአውሮፓ መካከል የሚደረገውን ጉዞ ቢያንስ በአስር ቀናት ይቀንሳል። ይህ የባህር መተላለፊያ በመቶ የሚሆነውን የአለም የባህር ላይ ንግድ ያስተናግዳል። በመቶ የሚሆነውን የኮንቴይነር ጉዞ በመቶ የመኪና ጭነት እና በመቶ የድፍድፍ ነዳጅ በሱዌዝ ካናል ይተላለፋል።"

Specifications

Language: Amharic (አማርኛ)
Training Method: From scratch (no pre-trained model fine-tuning)
Output Sample Rate: 48,000 Hz (48 kHz)
Optimization Target: Low-latency, high-fidelity, edge device compatibility

Architecture Modifications

To achieve its lightweight profile, this model features several adjustments to its internal components:

Decoder Swap: Replaced the default audio synthesis decoder with an optimized 48kHz unit to support high-fidelity playback.
Layer/Parameter Optimization: Streamlined layer configurations to minimize computational overhead, reducing CPU and RAM/VRAM usage during generation.
Optimized Tokenizer: Configured to work efficiently with the Ge'ez script, maximizing processing speed during the text-frontend pipeline.

Inference & Deployment

Due to its modified lightweight structure, inference scripts must be configured to support the custom 48kHz decoder output.

Basic Usage Flow

Ensure your inference code accommodates the custom architecture and sample rate:

# Inference Script Yet to come...

Performance & Limitations

Resource Consumption: Significantly faster and less resource-heavy than larger, non-optimized TTS models. Well-suited for real-time applications on CPU-bound or low-VRAM environments.
Data-Specific Nuances: Since the model was trained from scratch, the vocabulary and pronunciation boundaries are tightly bound to the training distribution. Text containing heavy mixtures of foreign languages may require pre-processing/transliteration.
High-Fidelity Requirements: To fully appreciate the 48kHz output, ensure that reference prompts (if using zero-shot cloning) are clean, high-resolution, and free of background noise.

Credits & Acknowledgments

Model Design & Training: Customized, modified, and trained from scratch by Nekodimos.

Downloads last month: -; Downloads are not tracked for this model. How to track