Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.49.1
title: MiloMusic - AI Music Generation
emoji: π΅
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.25.0
app_file: app.py
pinned: false
python_version: '3.10'
license: mit
short_description: AI-powered voice-to-song generation using YuE model
MiloMusic π΅ - Hugging Face Spaces
π¦ AI-Powered Music Creation for Everyone
MiloMusic is an innovative platform that leverages multiple AI models to democratize music creation. Whether you're a seasoned musician or have zero musical training, MiloMusic enables you to create high-quality, lyrics-focused music through natural language conversation.
A platform for everyone - regardless of musical training at the intersection of AI and creative expression.
π Features
- Natural Language Interface - Just start talking to generate song lyrics
- Genre & Mood Selection - Customize your music with different genres and moods
- Iterative Creation Process - Refine your lyrics through conversation
- High-Quality Music Generation - Transform lyrics into professional-sounding music
- User-Friendly Interface - Intuitive UI built with Gradio
π§ Architecture
MiloMusic employs a sophisticated multi-model pipeline to deliver a seamless music creation experience:
Phase 1: Lyrics Generation
- Speech-to-Text - User voice input is transcribed using
whisper-large-v3-turbo(via Groq API) - Conversation & Refinement -
llama-4-scout-17b-16e-instructhandles the creative conversation, generates lyrics based on user requests, and allows for iterative refinement
Phase 2: Music Generation
- Lyrics Structuring -
Gemini flash 2.0processes the conversation history and structures the final lyrics for music generation - Music Synthesis -
YuE(δΉ) transforms the structured lyrics into complete songs with vocals and instrumentation
π» Technical Stack
- LLM Models:
whisper-large-v3-turbo(via Groq) - For speech-to-text conversionllama-4-scout-17b-16e-instruct- For creative conversation and lyrics generationGemini flash 2.0- For lyrics structuringYuE- For music generation
- UI: Gradio 5.25.0
- Backend: Python 3.10
- Deployment: Hugging Face Spaces with GPU support
π System Requirements
- Python: 3.10 (strict requirement for YuE model compatibility)
- CUDA: 12.4+ for GPU acceleration
- Memory: 32GB+ RAM for model operations
- GPU: A10G/T4 or better with 24GB+ VRAM
π Usage
Using the Interface:
- Select your genre, mood, and theme preferences
- Start talking about your song ideas
- The assistant will create lyrics based on your selections
- Give feedback to refine the lyrics
- When you're happy with the lyrics, click "Generate Music from Lyrics"
- Listen to your generated song!
π¬ Performance
Music generation typically takes:
- GPU-accelerated: ~5-10 minutes per song
- Quality: Professional-grade vocals and instrumentation
- Format: High-quality audio output
π οΈ Development Notes
Spaces-Specific Configuration:
- Custom PyTorch build with CUDA 12.4 support
- Flash Attention compiled from source for optimal performance
- Specialized audio processing pipeline for cloud deployment
Key Components:
requirements_space.txt- Dependencies with CUDA-specific PyTorchpackages.txt- System packages for audio and compilation- Pre-build flash-attn installation for compatibility
π¨ Important Notes
- First run may take longer as models are downloaded and cached
- Flash Attention compilation happens during startup (may take 10-15 minutes on first build)
- Memory usage is high during music generation - please be patient
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request to the main repository.
π₯ Team
- Norton Gu
- Anakin Huang
- Erik Wasmosy
π License
This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.
Made with β€οΈ and π¦ (LLaMA) | Deployed on π€ Hugging Face Spaces