MiloMusic_YuEGP

Sleeping

App Files Files Community

MiloMusic_YuEGP / README.md

futurespyhi

Complete MiloMusic implementation with voice-to-song generation

658e790 about 2 months ago

preview code

raw

history blame contribute delete

4.36 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

title: MiloMusic - AI Music Generation
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.25.0
app_file: app.py
pinned: false
python_version: '3.10'
license: mit
short_description: AI-powered voice-to-song generation using YuE model

MiloMusic 🎵 - Hugging Face Spaces

🦙 AI-Powered Music Creation for Everyone

MiloMusic is an innovative platform that leverages multiple AI models to democratize music creation. Whether you're a seasoned musician or have zero musical training, MiloMusic enables you to create high-quality, lyrics-focused music through natural language conversation.

A platform for everyone - regardless of musical training at the intersection of AI and creative expression.

🚀 Features

Natural Language Interface - Just start talking to generate song lyrics
Genre & Mood Selection - Customize your music with different genres and moods
Iterative Creation Process - Refine your lyrics through conversation
High-Quality Music Generation - Transform lyrics into professional-sounding music
User-Friendly Interface - Intuitive UI built with Gradio

🔧 Architecture

MiloMusic employs a sophisticated multi-model pipeline to deliver a seamless music creation experience:

Phase 1: Lyrics Generation

Speech-to-Text - User voice input is transcribed using whisper-large-v3-turbo (via Groq API)
Conversation & Refinement - llama-4-scout-17b-16e-instruct handles the creative conversation, generates lyrics based on user requests, and allows for iterative refinement

Phase 2: Music Generation

Lyrics Structuring - Gemini flash 2.0 processes the conversation history and structures the final lyrics for music generation
Music Synthesis - YuE (乐) transforms the structured lyrics into complete songs with vocals and instrumentation

💻 Technical Stack

LLM Models:
- whisper-large-v3-turbo (via Groq) - For speech-to-text conversion
- llama-4-scout-17b-16e-instruct - For creative conversation and lyrics generation
- Gemini flash 2.0 - For lyrics structuring
- YuE - For music generation
UI: Gradio 5.25.0
Backend: Python 3.10
Deployment: Hugging Face Spaces with GPU support

📋 System Requirements

Python: 3.10 (strict requirement for YuE model compatibility)
CUDA: 12.4+ for GPU acceleration
Memory: 32GB+ RAM for model operations
GPU: A10G/T4 or better with 24GB+ VRAM

🔍 Usage

Using the Interface:

Select your genre, mood, and theme preferences
Start talking about your song ideas
The assistant will create lyrics based on your selections
Give feedback to refine the lyrics
When you're happy with the lyrics, click "Generate Music from Lyrics"
Listen to your generated song!

🔬 Performance

Music generation typically takes:

GPU-accelerated: ~5-10 minutes per song
Quality: Professional-grade vocals and instrumentation
Format: High-quality audio output

🛠️ Development Notes

Spaces-Specific Configuration:

Custom PyTorch build with CUDA 12.4 support
Flash Attention compiled from source for optimal performance
Specialized audio processing pipeline for cloud deployment

Key Components:

requirements_space.txt - Dependencies with CUDA-specific PyTorch
packages.txt - System packages for audio and compilation
Pre-build flash-attn installation for compatibility

🚨 Important Notes

First run may take longer as models are downloaded and cached
Flash Attention compilation happens during startup (may take 10-15 minutes on first build)
Memory usage is high during music generation - please be patient

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request to the main repository.

👥 Team

Norton Gu
Anakin Huang
Erik Wasmosy

📝 License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Made with ❤️ and 🦙 (LLaMA) | Deployed on 🤗 Hugging Face Spaces