LocalSong / README.md
Localsong's picture
Upload 5 files
12bbde9 verified
metadata
license: apache-2.0

LocalSong

LocalSong is a 700M parameter audio generation model focused on melodic instrumental music that uses tag-based conditioning. It was trained in 3 days on 1xH100 from scratch, reusing the ACE-Step VAE.

Installation

Prerequisites

  • Python 3.10 or higher
  • CUDA-capable GPU recommended with 8GB of VRAM

Setup

hf download Localsong/LocalSong --local-dir LocalSong
cd LocalSong
python3 -m venv venv
source venv/bin/activate
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --extra-index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

Run

python gradio_app.py

The interface will be available at http://localhost:7860

Generation Advice

Generations should use one of the soundtrack, soundtrack1 or soundtrack2 tags, as well as at least one other tag. They can use up to 8 tags; try combining genres and instruments.

The default settings (CFG 3.5, steps 200) have been tested as optimal.

If generation is too slow on your system, try lowering steps to 100.

The first generation will be slower due to torch.compile, then speed will increase.

The model was trained on vocals but not lyrics. Vocals will not have recognizable words.

LoRA Training

  • Prepare folder of .mp3 files
  • Run python train_lora_encode_latents.py --audio-dir=/path/to/your/mp3s --output-dir=latents to save the latents
  • Run python train_lora.py --latents_dir=latents to train the LoRA. You may need to adjust learning rate, steps or batch size depending on your dataset etc.
  • Run python merge_lora.py --lora-checkpoint=lora_step1000.safetensors --output-checkpoint=merged.safetensors to merge the LoRA checkpoint into the base model for inference
  • Run python gradio_app.py --checkpoint=merged.safetensors to run the merged checkpoint for inference
  • Test inference with tag "soundtrack"; Lora training uses this tag. Additional tags may work.

Credits

This project builds upon the following open-source projects:

  • Model Architecture: Adapted from DDT
  • Flow Matching: Adapted from minRF
  • Audio VAE: ACE-Step

License

This project is licensed under the Apache License 2.0