mattricesound commited on
Commit
c6d02ea
β€’
1 Parent(s): 32701af

Update README, fix minor instance checks

Browse files
README.md CHANGED
@@ -1,7 +1,20 @@
1
  # General Purpose Audio Effect Removal
2
- Removing multiple audio effects from multiple sources using compositional audio effect removal and source separation and speech enhancement models.
3
 
4
- This repo contains the code for the paper [General Purpose Audio Effect Removal](https://arxiv.org/abs/2110.00484). (Todo: Link broken, Add video, Add img, citation, license)
 
 
 
 
 
 
 
 
 
 
 
 
 
5
 
6
 
7
  # Setup
@@ -13,29 +26,38 @@ pip install -e . ./umx
13
  pip install --no-deps hearbaseline
14
  ```
15
  Due to incompatabilities with hearbaseline's dependencies (namely numpy/numba) and our other packages, we need to install hearbaseline with no dependencies.
 
 
 
16
  # Usage
17
- This repo can be used for many different tasks. Here are some examples.
18
  ## Run RemFX Detect on a single file
19
- First, need to download the checkpoints from [zenodo](https://zenodo.org/record/8179396)
 
 
20
  ```
21
- scripts/download_checkpoints.sh
22
  ```
23
  Then run the detect script. This repo contains an example file `example.wav` from our test dataset which contains 2 effects (chorus and delay) applied to a guitar.
24
  ```
25
  scripts/remfx_detect.sh example.wav -o dry.wav
26
  ```
27
  ## Download the [General Purpose Audio Effect Removal evaluation datasets](https://zenodo.org/record/8187288)
 
28
  ```
29
  scripts/download_eval_datasets.sh
30
  ```
31
 
32
  ## Download the starter datasets
 
 
 
33
  ```
34
  python scripts/download.py vocalset guitarset dsd100 idmt-smt-drums
35
  ```
36
  By default, the starter datasets are downloaded to `./data/remfx-data`. To change this, pass `--output_dir={path/to/datasets}` to `download.py`
37
 
38
- Then set the dataset root :
39
  ```
40
  export DATASET_ROOT={path/to/datasets}
41
  ```
@@ -47,6 +69,13 @@ This project uses the [pytorch-lightning](https://www.pytorchlightning.ai/index.
47
  python scripts/train.py +exp={experiment_name}
48
  ```
49
 
 
 
 
 
 
 
 
50
  Here are some selected experiment types from the paper, which use different datasets and configurations. See `cfg/exp/` for a full list of experiments and parameters.
51
 
52
  | Experiment Type | Config Name | Example |
@@ -57,42 +86,39 @@ Here are some selected experiment types from the paper, which use different data
57
  | Monolithic (<=5 FX) | 5-5_full | +exp=5-5_full |
58
  | Classifier | 5-5_full_cls | +exp=5-5_full_cls |
59
 
60
- To change the configuration, simply edit the experiment file, or override the configuration on the command line. A description of some of these variables is in the Misc. section below.
61
  You can also create a custom experiment by creating a new experiment file in `cfg/exp/` and overriding the default parameters in `config.yaml`.
62
 
63
- At the end of training, the train script will automatically evaluate the test set using the best checkpoint (by validation loss). If epoch 0 is not finished, it will throw an error. To evaluate a specific checkpoint, run
 
 
 
 
64
 
 
65
  ```
66
- python scripts/test.py +exp={experiment_name} +ckpt_path="{path/to/checkpoint}" render_files=False
 
67
  ```
68
 
69
  The checkpoints will be saved in `./logs/ckpts/{timestamp}`
70
- Metrics and hyperparams will be logged in `./lightning_logs/{timestamp}`
71
 
72
- By default, the dataset needed for the experiment is generated before training.
 
73
  If you have generated the dataset separately (see Generate datasets used in the paper), be sure to set `render_files=False` in the config or command-line, and set `render_root={path/to/dataset}` if it is in a custom location.
74
 
75
- Also note that the training assumes you have a GPU. To train on CPU, set `accelerator=null` in the config or command-line.
76
-
77
- ### Logging
78
- Default CSV logger
79
- To use WANDB logger:
80
 
81
- export WANDB_PROJECT={desired_wandb_project}
82
- export WANDB_ENTITY={your_wandb_username}
83
 
84
- ## Panns pretrianed
85
- ```
86
- wget https://zenodo.org/record/6332525/files/hear2021-panns_hear.pth
87
- ```
88
 
89
  ## Evaluate models on the General Purpose Audio Effect Removal evaluation datasets (Table 4 from the paper)
90
- First download the General Purpose Audio Effect Removal evaluation datasets (see above).
91
- To use the pretrained RemFX model, download the checkpoints
92
  ```
93
- scripts/download_checkpoints.sh
94
  ```
95
- Then run the evaluation script, select the RemFX configuration, between `remfx_oracle`, `remfx_detect`, and `remfx_all`. Then select N, the number of effects to remove.
96
  ```
97
  scripts/eval.sh remfx_detect 0-0
98
  scripts/eval.sh remfx_detect 1-1
@@ -100,15 +126,17 @@ scripts/eval.sh remfx_detect 2-2
100
  scripts/eval.sh remfx_detect 3-3
101
  scripts/eval.sh remfx_detect 4-4
102
  scripts/eval.sh remfx_detect 5-5
103
-
104
  ```
 
 
 
105
  To eval a custom monolithic model, first train a model (see Training)
106
  Then run the evaluation script, with the config used and checkpoint_path.
107
  ```
108
- scripts/eval.sh distortion_aug 0-0 -ckpt "logs/ckpts/2023-07-26-10-10-27/epoch\=05-valid_loss\=8.623.ckpt"
109
  ```
110
 
111
- To eval a custom effect-specific model as part of the inference chain, first train a model (see Training), then edit `cfg/exp/remfx_{desired_configuration}.yaml -> ckpts -> {effect}`.
112
  Then run the evaluation script.
113
  ```
114
  scripts/eval.sh remfx_detect 0-0
@@ -128,40 +156,61 @@ For example, to generate the `chorus` FXAug dataset, which includes files with 5
128
  python scripts/generate_dataset.py +exp=chorus_aug
129
  ```
130
 
131
- See the Misc. section below for a description of the parameters.
132
  By default, files are rendered to `{render_root} / processed / {string_of_effects} / {train|val|test}`.
133
 
134
- If training, this process will be done automatically at the start of training. To disable this, set `render_files=False` in the config or command-line, and set `render_root={path/to/dataset}` if it is in a custom location.
135
-
136
- # Misc.
137
- ## Experimental parameters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
  Some relevant dataset/training parameters descriptions
139
  - `num_kept_effects={[min, max]}` range of <b> Kept </b> effects to apply to each file. Inclusive.
140
  - `num_removed_effects={[min, max]}` range of <b> Removed </b> effects to apply to each file. Inclusive.
141
- - `model={model}` architecture to use (see 'Effect Removal Models/Effect Classification Models')
142
  - `effects_to_keep={[effect]}` Effects to apply but not remove (see 'Effects'). Used for FXAug.
143
- - `effects_to_remove={[effect]}` Effects to remove (see 'Effects')
144
- - `accelerator=null/'gpu'` Use GPU (1 device) (default: null)
145
- - `render_files=True/False` Render files. Disable to skip rendering stage (default: True)
146
- - `render_root={path/to/dir}`. Root directory to render files to (default: ./data)
147
- - `datamodule.train_batch_size={batch_size}`. Change batch size (default: varies)
148
-
149
- ### Effect Removal Models
 
150
  - `umx`
151
  - `demucs`
152
  - `tcn`
153
  - `dcunet`
154
  - `dptnet`
155
 
156
- ### Effect Classification Models
157
  - `cls_vggish`
158
  - `cls_panns_pt`
159
  - `cls_wav2vec2`
160
  - `cls_wav2clip`
161
 
162
- ### Effects
 
 
163
  - `chorus`
164
  - `compressor`
165
- - `distortion`
166
  - `reverb`
167
- - `delay`
 
1
  # General Purpose Audio Effect Removal
2
+ Removing multiple audio effects from multiple sources with compositional audio effect removal using source separation and speech enhancement models.
3
 
4
+ This repo contains the code for the paper [General Purpose Audio Effect Removal](https://arxiv.org/abs/2110.00484). (Todo: Paper link broken, Arxiv badge broken, citation, license)
5
+
6
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
7
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LoLgL1YHzIQfILEayDmRUZzDZzJpD6rD)
8
+ [![arXiv](https://img.shields.io/badge/arXiv-1234.56789-b31b1b.svg)](https://arxiv.org/abs/1234.56789)
9
+
10
+ ![RemFX Block diagram](remfx-headline.jpg?raw=true "Title")
11
+
12
+ Listening examples can be found [here](https://csteinmetz1.github.io/RemFX/).
13
+
14
+ ## Abstract
15
+
16
+ Although the design and application of audio effects is well understood, the inverse problem of removing these effects is significantly more challenging and far less studied. Recently, deep learning has been applied to audio effect removal; however, existing approaches have focused on narrow formulations considering only one effect or source type at a time. In realistic scenarios, multiple effects are applied with varying source content. This motivates a more general task, which we refer to as general purpose audio effect removal. We developed a dataset for this task using five audio effects across four different sources and used it to train and evaluate a set of existing architectures. We found that no single model performed optimally on all effect types and sources. To address this, we introduced <b>RemFX</b>, an approach designed to mirror the compositionality of applied effects. We first trained a set of the best-performing effect-specific
17
+ removal models and then leveraged an audio effect classification model to dynamically construct a graph of our models at inference. We found our approach to outperform single model baselines, although examples with many effects present remain challenging.
18
 
19
 
20
  # Setup
 
26
  pip install --no-deps hearbaseline
27
  ```
28
  Due to incompatabilities with hearbaseline's dependencies (namely numpy/numba) and our other packages, we need to install hearbaseline with no dependencies.
29
+ <b>Please run the setup code before running any scripts.</b>
30
+ All scripts should be launched from the top level after installing.
31
+
32
  # Usage
33
+ This repo can be used for many different tasks. Here are some examples. Ensure you have run the setup code before running any scripts.
34
  ## Run RemFX Detect on a single file
35
+ Here we will attempt to detect, then remove effects that are present in an audio file. For the best results, use a file from our [evaluation dataset](https://zenodo.org/record/8187288). We support detection and removal of the following effects: chorus, delay, distortion, dynamic range compression, and reverb.
36
+
37
+ First, we need to download the pytorch checkpoints from [zenodo](https://zenodo.org/record/8218621)
38
  ```
39
+ scripts/download_ckpts.sh
40
  ```
41
  Then run the detect script. This repo contains an example file `example.wav` from our test dataset which contains 2 effects (chorus and delay) applied to a guitar.
42
  ```
43
  scripts/remfx_detect.sh example.wav -o dry.wav
44
  ```
45
  ## Download the [General Purpose Audio Effect Removal evaluation datasets](https://zenodo.org/record/8187288)
46
+ We provide a script to download and unzip the datasets used in table 4 of the paper.
47
  ```
48
  scripts/download_eval_datasets.sh
49
  ```
50
 
51
  ## Download the starter datasets
52
+
53
+ If you'd like to train your own model and/or generate a dataset, you can download the starter datasets using the following command:
54
+
55
  ```
56
  python scripts/download.py vocalset guitarset dsd100 idmt-smt-drums
57
  ```
58
  By default, the starter datasets are downloaded to `./data/remfx-data`. To change this, pass `--output_dir={path/to/datasets}` to `download.py`
59
 
60
+ Then set the dataset root:
61
  ```
62
  export DATASET_ROOT={path/to/datasets}
63
  ```
 
69
  python scripts/train.py +exp={experiment_name}
70
  ```
71
 
72
+ At the end of training, the train script will automatically evaluate the test set using the best checkpoint (by validation loss). If epoch 0 is not finished, it will throw an error. To evaluate a specific checkpoint, run
73
+
74
+ ```
75
+ python scripts/test.py +exp={experiment_name} +ckpt_path="{path/to/checkpoint}" render_files=False
76
+ ```
77
+
78
+ ### Experiments
79
  Here are some selected experiment types from the paper, which use different datasets and configurations. See `cfg/exp/` for a full list of experiments and parameters.
80
 
81
  | Experiment Type | Config Name | Example |
 
86
  | Monolithic (<=5 FX) | 5-5_full | +exp=5-5_full |
87
  | Classifier | 5-5_full_cls | +exp=5-5_full_cls |
88
 
89
+ To change the configuration, simply edit the experiment file, or override the configuration on the command line. A description of some of these variables is in the Experimental parameters section below.
90
  You can also create a custom experiment by creating a new experiment file in `cfg/exp/` and overriding the default parameters in `config.yaml`.
91
 
92
+ ### Logging
93
+ By default, training uses the Pytorch Lightning CSV Logger
94
+ Metrics and hyperparams will be logged in `./lightning_logs/{timestamp}`
95
+
96
+ [Weights and Biases](https://wandb.ai/) logging can also be used, and will log audio during training and testing. To use Weights and Biases, set `logger=wandb` in the config or command-line. Make sure you have an account and are logged in.
97
 
98
+ Then set the project and entity:
99
  ```
100
+ export WANDB_PROJECT={desired_wandb_project}
101
+ export WANDB_ENTITY={your_wandb_username}
102
  ```
103
 
104
  The checkpoints will be saved in `./logs/ckpts/{timestamp}`
 
105
 
106
+ ### Misc.
107
+ - By default, the dataset needed for the experiment is generated before training.
108
  If you have generated the dataset separately (see Generate datasets used in the paper), be sure to set `render_files=False` in the config or command-line, and set `render_root={path/to/dataset}` if it is in a custom location.
109
 
110
+ - Training assumes you have a CUDA GPU. To train on CPU, set `accelerator=null` in the config or command-line.
 
 
 
 
111
 
112
+ - If training with the pretrained PANNs model, download the pretrained model from [here](https://zenodo.org/record/6332525) or run: `wget https://zenodo.org/record/6332525/files/hear2021-panns_hear.pth`. Place this in the root of the repo.
 
113
 
 
 
 
 
114
 
115
  ## Evaluate models on the General Purpose Audio Effect Removal evaluation datasets (Table 4 from the paper)
116
+ We provide a way to replicate the results of table 4 from our paper. First download the <b>General Purpose Audio Effect Removal evaluation datasets</b> (see above).
117
+ To use the pretrained RemFX model, download the checkpoints:
118
  ```
119
+ scripts/download_ckpts.sh
120
  ```
121
+ Then run the evaluation script. First select the RemFX configuration, between `remfx_oracle`, `remfx_detect`, and `remfx_all`. As a reminder, `remfx_oracle` uses the ground truth labels of the present effects to determine which removal models to apply, `remfx_detect` detects which effects are present, and `remfx_all` assumes all effects are present.
122
  ```
123
  scripts/eval.sh remfx_detect 0-0
124
  scripts/eval.sh remfx_detect 1-1
 
126
  scripts/eval.sh remfx_detect 3-3
127
  scripts/eval.sh remfx_detect 4-4
128
  scripts/eval.sh remfx_detect 5-5
 
129
  ```
130
+ In this case the `N-N` refers to the number of effects present for each example in the dataset.
131
+
132
+
133
  To eval a custom monolithic model, first train a model (see Training)
134
  Then run the evaluation script, with the config used and checkpoint_path.
135
  ```
136
+ scripts/eval.sh distortion_aug 0-0 -ckpt "{path/to/checkpoint}"
137
  ```
138
 
139
+ To eval a custom effect-specific model as part of the inference chain, first train a model (see Training), then edit `cfg/exp/remfx_{desired_configuration}.yaml -> ckpts -> {effect}`. Select between `remfx_detect`, `remfx_oracle`, and `remfx_all`.
140
  Then run the evaluation script.
141
  ```
142
  scripts/eval.sh remfx_detect 0-0
 
156
  python scripts/generate_dataset.py +exp=chorus_aug
157
  ```
158
 
159
+ See the Experimental parameters section below for a description of the parameters.
160
  By default, files are rendered to `{render_root} / processed / {string_of_effects} / {train|val|test}`.
161
 
162
+ The dataset that is generated contains 8000 train examples, 1000 validation examples, and 1000 test examples. Each example is contained in a folder labeled by its id number (ex. 0-7999 for train examples) with 4 files like so:
163
+ ```
164
+ .
165
+ └── train
166
+ β”œβ”€β”€ 0
167
+ β”‚Β Β  β”œβ”€β”€ dry_effects.pt
168
+ β”‚Β Β  β”œβ”€β”€ input.wav
169
+ β”‚Β Β  β”œβ”€β”€ target.wav
170
+ β”‚Β Β  └── wet_effects.pt
171
+ β”œβ”€β”€ 1
172
+ β”‚Β Β  └── ...
173
+ β”œβ”€β”€ ...
174
+ β”œβ”€β”€ 7999
175
+ β”‚Β Β  └── ...
176
+ ```
177
+ ### File descriptions
178
+ - dry_effects.pt = serialized PyTorch file that contains a list of the effects applied to the dry audio file
179
+ - input.wav = the wet audio file
180
+ - target.wav = the dry audio file
181
+ - wet_effects.pt = serialized PyTorch file that contains a list of the effects applied to the wet audio file
182
+
183
+ Note: if training, this process will be done automatically at the start of training. To disable this, set `render_files=False` in the config or command-line, and set `render_root={path/to/dataset}` if it is in a custom location.
184
+
185
+ # Experimental parameters
186
  Some relevant dataset/training parameters descriptions
187
  - `num_kept_effects={[min, max]}` range of <b> Kept </b> effects to apply to each file. Inclusive.
188
  - `num_removed_effects={[min, max]}` range of <b> Removed </b> effects to apply to each file. Inclusive.
189
+ - `model={model}` architecture to use (see 'Effect Removal Models/Effect Classification Models').
190
  - `effects_to_keep={[effect]}` Effects to apply but not remove (see 'Effects'). Used for FXAug.
191
+ - `effects_to_remove={[effect]}` Effects to remove (see 'Effects').
192
+ - `accelerator=null/'gpu'` Use GPU (1 device) (default: null).
193
+ - `render_files=True/False` Render files. Disable to skip rendering stage (default: True).
194
+ - `render_root={path/to/dir}`. Root directory to render files to (default: ./data).
195
+ - `datamodule.train_batch_size={batch_size}`. Change batch size (default: varies).
196
+ - `logger=wandb`. Use weights and biases logger (default: csv). Ensure you set the wandb environment variables (see training section).
197
+
198
+ ## Effect Removal Models
199
  - `umx`
200
  - `demucs`
201
  - `tcn`
202
  - `dcunet`
203
  - `dptnet`
204
 
205
+ ## Effect Classification Models
206
  - `cls_vggish`
207
  - `cls_panns_pt`
208
  - `cls_wav2vec2`
209
  - `cls_wav2clip`
210
 
211
+ ## Effects
212
+ - `delay`
213
+ - `distortion`
214
  - `chorus`
215
  - `compressor`
 
216
  - `reverb`
 
remfx-headline.jpg ADDED
remfx/callbacks.py CHANGED
@@ -90,7 +90,7 @@ def log_wandb_audio_batch(
90
  caption: str = "",
91
  max_items: int = 10,
92
  ):
93
- if type(logger) != pl.loggers.WandbLogger:
94
  return
95
  num_items = samples.shape[0]
96
  samples = rearrange(samples, "b c t -> b t c")
 
90
  caption: str = "",
91
  max_items: int = 10,
92
  ):
93
+ if not isinstance(logger, pl.loggers.WandbLogger):
94
  return
95
  num_items = samples.shape[0]
96
  samples = rearrange(samples, "b c t -> b t c")
remfx/utils.py CHANGED
@@ -72,7 +72,7 @@ def log_hyperparameters(
72
  if "callbacks" in config:
73
  hparams["callbacks"] = config["callbacks"]
74
 
75
- if type(trainer.logger) == pl.loggers.CSVLogger:
76
  logger.log_hyperparams(hparams)
77
  else:
78
  logger.experiment.config.update(hparams)
 
72
  if "callbacks" in config:
73
  hparams["callbacks"] = config["callbacks"]
74
 
75
+ if isinstance(logger, pl.loggers.CSVLogger):
76
  logger.log_hyperparams(hparams)
77
  else:
78
  logger.experiment.config.update(hparams)
scripts/download_ckpts.sh CHANGED
@@ -4,9 +4,9 @@
4
  mkdir -p ckpts
5
 
6
  # download ckpts and save to ckpts directory
7
- wget https://zenodo.org/record/8179396/files/classifier.ckpt?download=1 -O ckpts/classifier.ckpt
8
- wget https://zenodo.org/record/8179396/files/dcunet_chorus_aug.ckpt?download=1 -O ckpts/dcunet_chorus_aug.ckpt
9
- wget https://zenodo.org/record/8179396/files/dcunet_delay_aug.ckpt?download=1 -O ckpts/dcunet_delay_aug.ckpt
10
- wget https://zenodo.org/record/8179396/files/dcunet_reverb_aug.ckpt?download=1 -O ckpts/dcunet_reverb_aug.ckpt
11
- wget https://zenodo.org/record/8179396/files/demucs_compressor_aug.ckpt?download=1 -O ckpts/demucs_compressor_aug.ckpt
12
- wget https://zenodo.org/record/8179396/files/demucs_distortion_aug.ckpt?download=1 -O ckpts/demucs_distortion_aug.ckpt
 
4
  mkdir -p ckpts
5
 
6
  # download ckpts and save to ckpts directory
7
+ wget https://zenodo.org/record/8218621/files/classifier.ckpt?download=1 -O ckpts/classifier.ckpt
8
+ wget https://zenodo.org/record/8218621/files/dcunet_chorus_aug.ckpt?download=1 -O ckpts/dcunet_chorus_aug.ckpt
9
+ wget https://zenodo.org/record/8218621/files/dcunet_delay_aug.ckpt?download=1 -O ckpts/dcunet_delay_aug.ckpt
10
+ wget https://zenodo.org/record/8218621/files/dcunet_reverb_aug.ckpt?download=1 -O ckpts/dcunet_reverb_aug.ckpt
11
+ wget https://zenodo.org/record/8218621/files/demucs_compressor_aug.ckpt?download=1 -O ckpts/demucs_compressor_aug.ckpt
12
+ wget https://zenodo.org/record/8218621/files/demucs_distortion_aug.ckpt?download=1 -O ckpts/demucs_distortion_aug.ckpt
scripts/generate_dataset.py CHANGED
@@ -8,7 +8,7 @@ def main(cfg: DictConfig):
8
  # Apply seed for reproducibility
9
  if cfg.seed:
10
  pl.seed_everything(cfg.seed)
11
- datamodule = hydra.utils.instantiate(cfg.datamodule, _convert_="partial")
12
 
13
 
14
  if __name__ == "__main__":
 
8
  # Apply seed for reproducibility
9
  if cfg.seed:
10
  pl.seed_everything(cfg.seed)
11
+ _ = hydra.utils.instantiate(cfg.datamodule, _convert_="partial")
12
 
13
 
14
  if __name__ == "__main__":