Great Work
#1
by Toro112 - opened
First, thank you so much for open-sourcing VoiceTut-TTS! The model's expressiveness in colloquial Egyptian Arabic is incredibly impressive—it captures the natural cadence and podcast prosody better than any other open model I've tested.
I completely understand that the underlying source audio may be bound by hosting platform restrictions or copyright sensitivities, is there any way for you to share that dataset or the any part of the metadata or processing pipeline?
Specifically:
- The text transcripts and their respective duration alignments.
- A list of source links / IDs (e.g., specific YouTube/podcast episode references) so I can reconstruct or scrape a mirrored pipeline locally.
- The preprocessing/filtering scripts you used to clean out background music or multi-speaker overlap.
Any insights into your training configuration or data-curation methodology would be incredibly valuable for reproducing this work or expanding dialect representation in the open-source community.
Thank you again for your incredible contribution to Arabic speech AI!