Localsong
/

LocalSong

Model card Files Files and versions

Localsong commited on 20 days ago

Commit

d7a7e7a

·

verified ·

1 Parent(s): 7b25443

Upload 2 files

Files changed (2) hide show

README.md +7 -4
requirements.txt +3 -4

README.md CHANGED Viewed

@@ -4,22 +4,22 @@ license: apache-2.0
 # LocalSong
-LocalSong is an audio generation model focused on melodic instrumental music that uses tag-based conditioning to generate audio.
 ## Installation
 ### Prerequisites
 - Python 3.10 or higher
-- CUDA-capable GPU recommended
 ### Setup
 git clone https://huggingface.co/Localsong/LocalSong
 cd localsong
-python3 -m venv venv
 source venv/bin/activate
-pip install -r requirements.txt
 ### Run
@@ -30,7 +30,10 @@ The interface will be available at `http://localhost:7860`
 ### Generation Advice
 Generations should use one of the soundtrack, soundtrack1 or soundtrack2 tags, as well as at least one other tag. They can use up to 8 tags; try combining genres and instruments.
 The default settings (CFG 3.5, steps 200) have been tested as optimal.
 The first generation will be slower due to torch.compile, then speed will increase.
 The model was trained on vocals but not lyrics. Vocals will not have recognizable words.

 # LocalSong
+LocalSong is a 700M parameter audio generation model focused on melodic instrumental music that uses tag-based conditioning.
 ## Installation
 ### Prerequisites
 - Python 3.10 or higher
+- CUDA-capable GPU recommended with 8GB of VRAM
 ### Setup
 git clone https://huggingface.co/Localsong/LocalSong
 cd localsong
+python3.10 -m venv venv
 source venv/bin/activate
+pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --extra-index-url https://download.pytorch.org/whl/cu128
 ### Run
 ### Generation Advice
 Generations should use one of the soundtrack, soundtrack1 or soundtrack2 tags, as well as at least one other tag. They can use up to 8 tags; try combining genres and instruments.
 The default settings (CFG 3.5, steps 200) have been tested as optimal.
+If generation is too slow on your system, try lowering steps to 100.
 The first generation will be slower due to torch.compile, then speed will increase.
 The model was trained on vocals but not lyrics. Vocals will not have recognizable words.

requirements.txt CHANGED Viewed

@@ -1,7 +1,6 @@
-torch>=2.8.0
-torchaudio>=2.8.0
-torchvision>=0.23.0
-torchcodec>=0.8.0
 accelerate>=1.9.0
 diffusers>=0.34.0
 einops>=0.8.1

+torch==2.7.1
+torchaudio==2.7.1
+torchvision==0.22.1
 accelerate>=1.9.0
 diffusers>=0.34.0
 einops>=0.8.1