Configuration Parsing Warning:Invalid JSON for config file config.json

Leva-TTS — Levantine Arabic ⇄ English Text-to-Speech

🌿 Leva-TTS — Low-Latency Code-Switching TTS (Levantine Arabic ⇄ English)

A production-oriented Levantine Text-to-Speech model — a fine-tuned XTTS-v2 optimized for real-time conversational agents.

🎯 KPI	Target	Measured	Status
Peak VRAM (inference)	≤ 3 GB	2.13 GB	✅
Time-to-First-Audio (p50)	< 300 ms	565 ms	⚠️
Real-Time Factor (RTF)	< 0.3	0.21	✅
Streaming output	required	chunked PCM + WS	✅

Leva-TTS is a text-to-speech model for Levantine Arabic / English code-switching, built by fine-tuning XTTS-v2 on 50,000 synthetic utterances generated with Lahgtna-OmniVoice v2. It handles natural intra-sentence switching between Levantine dialect and English, supports 10 built-in speakers and zero-shot voice cloning, and offers a streaming generator for low-latency conversational use.

Base model: coqui/XTTS-v2 (GPT autoregressive backbone + HiFi-GAN decoder)
Languages: Levantine Arabic (ar), English (en), and code-switch mixes
Sample rate: 24 kHz
Speakers: Badr, Mohamed, Saad, Rami, Fadi (M) · Amina, Fatma, Lamyaa, Mona, Haneen (F)

✨ Key Features

Feature	Details
🗣️ Natural code-switching	Intra-sentence Arabic ↔ English
⚡ Streaming output	First audio chunk < 300 ms
💾 Low VRAM	≤ 3 GB at inference
🌿 Levantine dialect	ق→/ʔ/ glottal, ج→/ʒ/, il- article, b- prefix
🔤 Smart text front-end	Partial diacritics on homographs + Levantine lexicon
👥 10 speakers	5 male + 5 female, diverse Levantine accents
📡 WebSocket streaming	FastAPI server with real-time chunked PCM
🔌 Pipecat ready	Drop-in `TTSService` for voice agents

🚀 Quick start (pip)

conda create -n leva-tts python=3.10 -y && conda activate leva-tts
sudo apt-get install -y espeak-ng ffmpeg libsndfile1

# Install PyTorch first so pip locks a CUDA build matching your GPU driver.
# (torch >= 2.9 ships CUDA-13 wheels that fail on common CUDA-12.x drivers.)
pip install torch==2.3.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121

pip install leva-tts

Leva-TTS uses the maintained coqui-tts fork (same TTS/XTTS modules); the unmaintained TTS package pins numpy==1.22.0 and cannot resolve on modern Python. A plain pip install leva-tts resolves cleanly.

from leva_tts import LevaTTS, SPEAKERS
import soundfile as sf

tts = LevaTTS(device="cuda", preprocess_text=True, verbose=False)
# auto-downloads this checkpoint + the 10 reference speakers on first use

# 1) Built-in speaker  (speaker must be one of SPEAKERS, else ValueError)
wav, sr = tts.synthesize("هَلَّق أنا عم أشتغل على the project",
                         speaker="Badr", temperature=0.65)
sf.write("out.wav", wav, sr)            # sr == 24000

# 2) Zero-shot voice cloning (your own 3–10 s clip)
wav, sr = tts.zero_shot_synthesize("والله the meeting كانت important كتير",
                                   "my_voice.wav")

# 3) Streaming generators
for chunk in tts.stream("بِدِّي أحكيلك عن the new feature", speaker="Amina"):
    ...                                  # play / forward each chunk
for chunk in tts.zero_shot_stream("هلق عم نشتغل", "my_voice.wav"):
    ...

Generation parameters (optional, per-call on every method): temperature, length_penalty, repetition_penalty, top_k, top_p, speed.

For the FastAPI streaming server, Pipecat integration, the Gradio demo, evaluation and fine-tuning, clone the repo: 👉 https://github.com/MohammedAly22/Leva-TTS

📦 Files in this repo

File	Description
`best_model.pth`	Fine-tuned XTTS-v2 checkpoint (GPT + decoder)
`config.json`	XTTS-v2 config
`reference_audios/`	The 10 built-in speaker reference clips + `references.json`
`sample_wavs/`	Audio sample comparisons (Base XTTS-v2 vs Lahgtna v2 vs Leva-TTS)

Manual download: huggingface-cli download mohammedaly22/leva-tts

🎵 Audio samples — Model comparison

Click a sentence to expand and play the three models. Progression: Base XTTS-v2 → Lahgtna v2 → Leva-TTS.

🔀 Code-switching (Levantine + English)

هَلَّق أنا عم أشتغل على the new project اللي حكيتلك عنه — Badr (M)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

🟢 Leva-TTS (this model)

والله the weather today كتير حلو بدي أطلع برا — Fatma (F)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

🟢 Leva-TTS (this model)

بِدِّي أحكيلك عن the meeting اللي كان مهم كتير اليوم — Mona (F)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

🟢 Leva-TTS (this model)

Pure Levantine Arabic

كيفك اليوم؟ إنت شو عم تعمل هَلَّق؟ — Badr (M)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

🟢 Leva-TTS (this model)

هَلَّق رح أروح على البيت وبكرا برجع — Amina (F)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

🟢 Leva-TTS (this model)

شو رأيك نطلع نتمشى شوي بعد الشغل إذا الجو كان منيح؟ — Rami (M)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

🟢 Leva-TTS (this model)

🇬🇧 Pure English

Hello, how are you doing today? — Lamyaa (F)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

🟢 Leva-TTS (this model)

The project deadline is next Friday. — Mohamed (M)

Base XTTS-v2

Lahgtna v2 (Levantine fine-tune)

🟢 Leva-TTS (this model)

📊 Evaluation

Speaker Mohamed · NVIDIA H100 · Whisper large-v3 ASR round-trip · UTMOS (reference-free MOS).

Metric	Value
Peak VRAM (inference)	2.13 GB
RTF p50 / p95	0.36 / 0.53
TTFA p50 / p95 (batch)	1194 / 1743 ms
TTFA streaming (first chunk)	~565 ms
CER (mean)	0.255
WER (mean)	0.496
UTMOS	3.13 / 5.0

Category	CER ↓	WER ↓	UTMOS ↑
Pure English	0.144	0.190	3.35
Pure Levantine Arabic	0.236	0.544	2.97
Code-Switching	0.330	0.602	3.19

An optimized inference path (TF32 + torch.compile on the GPT) lowers RTF p95 by ~6% and TTFA while slightly improving UTMOS (3.24). See the repo's scripts/evaluate.py --optimize.

🏗️ How it was built

Text collection — 50K Levantine / code-switching / English sentences.
Synthesis — audio generated with Lahgtna-OmniVoice v2 (apc language code).
Data prep — 24 kHz, paired with a Levantine text front-end (number/date/ currency verbalization, partial diacritics on homographs, dialect lexicon).
Fine-tuning — XTTS-v2 GPT fine-tuned on the synthetic corpus.

A text front-end runs before synthesis (enabled via preprocess_text=True): language-aware normalization of numbers, floats, dates, times, currency, percentages, URLs, emails, phone numbers and codes, plus partial diacritics and a Levantine lexicon.

⚠️ Limitations & intended use

Optimized for Levantine dialect + English code-switching; other Arabic dialects (Egyptian, Gulf, MSA) are out of distribution.
Trained on synthetic speech — voices reflect the Lahgtna v2 generator.
License CC-BY-NC-4.0 (inherited from XTTS-v2): research / non-commercial use.

📜 Citation

@software{leva_tts_2026,
  author = {Mohammed Aly},
  title  = {Leva-TTS: Low-Latency Code-Switching TTS for Levantine Arabic and English},
  year   = {2026},
  url    = {https://github.com/MohammedAly22/Leva-TTS}
}

Built on Coqui XTTS-v2 and Lahgtna-OmniVoice v2.

Downloads last month: 123

Model tree for mohammedaly22/leva-tts

Base model

coqui/XTTS-v2

Finetuned

(67)

this model

Dataset used to train mohammedaly22/leva-tts

Space using mohammedaly22/leva-tts 1

Collection including mohammedaly22/leva-tts

Levantine Arabic Text-to-Speech (TTS)

Collection

3 items • Updated 6 days ago