Nick021402/PodXplain · Apply for community grant: Personal project (gpu)

PROJECT: PodXplain - AI-Powered Podcast Generation with Nari DIA

Space URL:
https://huggingface.co/spaces/Nick021402/PodXplain

Project Description:
"PodXplain" is an innovative Hugging Face Space designed to automate the creation of conversational audio podcasts from text. Users can input a script, and our application intelligently segments the text, assigns different speakers, and then generates an engaging multi-speaker audio podcast.

Our goal is to make podcast creation accessible and efficient, enabling creators to transform written content into dynamic audio experiences with minimal effort. This project demonstrates the power of cutting-edge dialogue-focused Text-to-Speech (TTS) models in a practical, user-friendly application.

Reason for GPU Grant Request:

We are currently integrating the Nari DIA 1.6B model for high-quality, multi-speaker dialogue synthesis. Nari DIA is an advanced, open-weights text-to-dialogue model that offers significant improvements in naturalness and conversational flow compared to generic TTS models.

However, the Nari DIA 1.6B model is computationally intensive and requires a GPU for efficient inference. While our Space is currently running on a CPU, the performance is severely limited, leading to extremely long processing times (minutes per short segment) and making the application unusable in a practical sense. The Nari DIA developers themselves state that CPU support is not officially available for full performance and highly recommend GPU usage.

To unlock the full potential of PodXplain and provide a seamless, real-time experience for users, a GPU upgrade is essential. Specifically, we would benefit greatly from an Nvidia T4 small or a comparable GPU instance to handle the demands of the Nari DIA 1.6B model.

Impact and Community Benefit:
By providing GPU access, this grant would enable:

Practical Demonstration of Nari DIA: Our Space would serve as a high-fidelity, accessible demo for the Nari DIA 1.6B model, showcasing its capabilities in a real-world dialogue generation scenario to a broader audience on the Hugging Face Hub.
Empowering Content Creators: Users, including podcasters, educators, and content creators, could rapidly prototype and generate audio content, significantly lowering the barrier to entry for audio production.
Contribution to the Hugging Face Ecosystem: A well-performing PodXplain Space would be a valuable addition to the Hugging Face Spaces library, highlighting the utility of advanced open-source AI models.
We are committed to maintaining and improving PodXplain as an open-source project and believe that access to GPU resources will be instrumental in achieving its full potential and benefiting the wider community.

Thank you for considering our application.