File size: 3,171 Bytes
99122c4
50f034f
99122c4
50f034f
99122c4
50f034f
99122c4
50f034f
 
99122c4
 
50f034f
 
99122c4
50f034f
 
99122c4
 
50f034f
 
99122c4
50f034f
 
99122c4
 
50f034f
 
e3ca5f7
99122c4
 
50f034f
6fcf6a4
 
6f55a79
6fcf6a4
7aa3063
 
 
 
6fcf6a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99122c4
50f034f
6fcf6a4
 
 
 
 
99122c4
50f034f
99122c4
50f034f
6fcf6a4
 
 
50f034f
99122c4
50f034f
 
99122c4
50f034f
 
99122c4
 
 
50f034f
99122c4
50f034f
99122c4
e3ca5f7
 
99122c4
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# VampNet

This repository contains recipes for training generative music models on top of the Lyrebird Audio Codec.

# Setting up

install AudioTools

```bash
git clone https://github.com/hugofloresgarcia/audiotools.git
pip install -e ./audiotools
```

install the LAC library. 

```bash
git clone https://github.com/hugofloresgarcia/lac.git
pip install -e ./lac
```

install VampNet

```bash
git clone https://github.com/hugofloresgarcia/vampnet2.git
pip install -e ./vampnet2
```

## A note on argbind
This repository relies on [argbind](https://github.com/pseeth/argbind) to manage CLIs and config files. 
Config files are stored in the `conf/` folder. 

## Getting the Pretrained Models

Download the pretrained models from [this link](https://drive.google.com/file/d/1ZIBMJMt8QRE8MYYGjg4lH7v7BLbZneq2/view?usp=sharing). Then, extract the models to the `models/` folder.

# How the code is structured

This code was written fast to meet a publication deadline, so it can be messy and redundant at times. Currently working on cleaning it up. 

```
β”œβ”€β”€ conf         <- (conf files for training, finetuning, etc)
β”œβ”€β”€ demo.py      <- (gradio UI for playing with vampnet)
β”œβ”€β”€ env          <- (environment variables)
β”‚Β Β  └── env.sh
β”œβ”€β”€ models       <- (extract pretrained models)
β”‚Β Β  β”œβ”€β”€ spotdl
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ c2f.pth     <- (coarse2fine checkpoint)
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ coarse.pth  <- (coarse checkpoint)
β”‚Β Β  β”‚Β Β  └── codec.pth    <- (codec checkpoint)
β”‚Β Β  └── wavebeat.pth
β”œβ”€β”€ README.md
β”œβ”€β”€ scripts
β”‚Β Β  β”œβ”€β”€ exp
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ eval.py       <- (eval script)
β”‚Β Β  β”‚Β Β  └── train.py       <- (training/finetuning script)
β”‚Β Β  └── utils
β”œβ”€β”€ vampnet
β”‚Β Β  β”œβ”€β”€ beats.py         <- (beat tracking logic)
β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”œβ”€β”€ interface.py     <- (high-level programmatic interface)
β”‚Β Β  β”œβ”€β”€ mask.py
β”‚Β Β  β”œβ”€β”€ modules
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ activations.py 
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ layers.py
β”‚Β Β  β”‚Β Β  └── transformer.py  <- (architecture + sampling code)
β”‚Β Β  β”œβ”€β”€ scheduler.py      
β”‚Β Β  └── util.py
```

# Usage

First, you'll want to set up your environment
```bash
source ./env/env.sh
```

## Staging a Run

Staging a run makes a copy of all the git-tracked files in the codebase and saves them to a folder for reproducibility. You can then run the training script from the staged folder. 

```
stage --name my_run --run_dir /path/to/staging/folder
```

## Training a model

```bash
python scripts/exp/train.py --args.load conf/vampnet.yml --save_path /path/to/checkpoints
```

## Fine-tuning
To fine-tune a model, see the configuration files under `conf/lora/`. 
You just need to provide a list of audio files // folders to fine-tune on, then launch the training job as usual.
```bash
python scripts/exp/train.py --args.load conf/lora/birds.yml --save_path /path/to/checkpoints
```



## Launching the Gradio Interface
```bash
python demo.py --args.load conf/interface/spotdl.yml --Interface.device cuda
```