|
|
"description": "--- license: apache-2.0 base_model: - coqui/XTTS-v2 --- # Auralis π ## Model Details π οΈ **Model Name:** Auralis **Model Architecture:** Based on Coqui XTTS-v2 **License:** - license: Apache 2.0 - base_model: XTTS-v2 Components Coqui AI License **Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi **Developed by:** AstraMind.ai **GitHub:** AstraMind AI **Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks. --- ## Model Description π Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by Coqui XTTS-v2 and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling. ### Key Features: - **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes. - **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090. - **Scalable:** Handles multiple requests simultaneously. - **Streaming:** Seamlessly processes long texts in a streaming format. - **Custom Voices:** Enables voice cloning from short reference audio. --- ## Quick Start β --- ## Ebook Generation π Auralis converting ebooks into audio formats at lightning speed. For Python script, check out ebook_audio_generator.py. --- ## Intended Use π Auralis is designed for: - **Content Creators:** Generate audiobooks, podcasts, or voiceovers. - **Developers:** Integrate TTS into applications via a simple Python API. - **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties. - **Multilingual Scenarios:** Convert text to speech in multiple supported languages. --- ## Performance π **Benchmarks on NVIDIA RTX 3090:** - Short phrases (<100 characters): ~1 second - Medium texts (<1,000 characters): ~5-10 seconds - Full books (~100,000 characters): ~10 minutes **Memory Usage:** - Base VRAM: ~4GB - Peak VRAM: ~10GB --- ## Model Features πΈ 1. **Speed & Efficiency:** - Smart batching for rapid processing of long texts. - Memory-optimized for consumer GPUs. 2. **Easy Integration:** - Python API with support for synchronous and asynchronous workflows. - Streaming mode for continuous playback during generation. 3. **Audio Quality Enhancements:** - Background noise reduction. - Voice clarity and volume normalization. - Customizable audio preprocessing. 4. **Multilingual Support:** - Automatic language detection. - High-quality speech in 15+ languages. 5. **Customization:** - Voice cloning using short reference clips. - Adjustable parameters for tone, pacing, and language. --- ## Limitations & Ethical Considerations β οΈ - **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent. - **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input. --- ## Citation π If you use Auralis in your research or projects, please cite:", |
|
|
"model_explanation_gemini": "Generates high-quality, multilingual text-to-speech with fast processing, voice cloning, and scalability for applications like audiobooks and accessibility.\n\nModel Features:\n1. Fast processing (e.g., full novel in ~10 minutes)\n2. Low VRAM usage (<10GB on RTX 3090)\n3. Scalable for concurrent requests\n4. Streaming support for long texts\n5. Voice cloning capability\n6. Multilingual support (15+ languages)\n7. Noise reduction and", |