mattricesound commited on
Commit
a559a3b
β€’
1 Parent(s): f65f2ca

Add new scripts/configs for eval and easy usage. Rename configs.

Browse files
README.md CHANGED
@@ -1,56 +1,80 @@
1
-
 
 
2
  # Setup
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
 
4
- ## Install Packages
5
- 1. `python3 -m venv env`
6
- 2. `source env/bin/activate`
7
- 3. `pip install -e .`
8
- 4. `git submodule update --init --recursive`
9
- 5. `pip install -e umx`
10
-
11
- ## Download [VocalSet Dataset](https://zenodo.org/record/1193957)
12
- 1. `wget https://zenodo.org/record/1442513/files/VocalSet1-2.zip?download=1`
13
- 2. `mv VocalSet.zip?download=1 VocalSet.zip`
14
- 3. `unzip VocalSet.zip`
15
-
16
- # Training
17
- ## Steps
18
- 1. Change Wandb and data root variables in `shell_vars.sh` and `source shell_vars.sh`
19
- 2. `python scripts/train.py +exp=default`
20
-
21
- ## Experiments
22
- Training parameters can be configured in `cfg/exp/default.yaml`. Here are some descriptions
23
- - `num_kept_effects={[min, max]}` range of <b> Kept </b> effects to apply to each file. Inclusive.
24
- - `num_removed_effects={[min, max]}` range of <b> Removed </b> effects to apply to each file. Inclusive.
25
- - `model={model}` architecture to use (see 'Models')
26
- - `effects_to_keep={[effect]}` Effects to apply but not remove (see 'Effects')
27
- - `effects_to_remove={[effect]}` Effects to remove (see 'Effects')
28
- - `accelerator=null/'gpu'` Use GPU (1 device) (default: null)
29
- - `render_files=True/False` Render files. Disable to skip rendering stage (default: True)
30
- - `render_root={path/to/dir}`. Root directory to render files to (default: DATASET_ROOT)
31
 
32
- These can also be specified on the command line.
33
- see `cfg/exp/default.yaml` for an example.
34
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ## Models
37
- - `umx`
38
- - `demucs`
39
- - `tcn`
40
- - `dcunet`
41
- - `dptnet`
42
 
43
- ## Effects
44
- - `chorus`
45
- - `compressor`
46
- - `distortion`
47
- - `reverb`
48
- - `delay`
 
49
 
50
- ## Chain Inference
51
- `python scripts/chain_inference.py +exp=chain_inference`
52
 
53
- ## Run inference on directory
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  Assumes directory is structured as
55
  - root
56
  - clean
@@ -66,45 +90,41 @@ Change root path in `shell_vars.sh` and `source shell_vars.sh`
66
 
67
  `python scripts/chain_inference.py +exp=chain_inference_custom`
68
 
 
 
 
 
 
 
 
 
 
 
 
69
 
 
 
 
 
 
 
70
 
71
- ## Misc.
72
- By default, files are rendered to `input_dir / processed / {string_of_effects} / {train|val|test}`.
73
-
 
 
74
 
75
- Download datasets:
 
 
 
 
 
76
 
77
- ```
78
- python scripts/download.py vocalset guitarset idmt-smt-guitar idmt-smt-bass idmt-smt-drums
79
- ```
80
 
81
- To run audio effects classifiction:
82
- ```
83
- python scripts/train.py model=classifier "effects_to_use=[compressor, distortion, reverb, chorus, delay]" "effects_to_remove=[]" max_kept_effects=5 max_removed_effects=0 shuffle_kept_effects=True shuffle_removed_effects=True accelerator='gpu' render_root=/scratch/RemFX render_files=True
84
- ```
85
 
86
- ```
87
- srun --comment harmonai --partition=g40 --gpus=1 --cpus-per-gpu=12 --job-name=harmonai --pty bash -i
88
- source env/bin/activate
89
- rsync -aP /fsx/home-csteinmetz1/data/EffectSet_cjs.tar /scratch
90
- tar -xvf EffectSet_cjs.tar
91
- mv scratch/EffectSet_cjs ./EffectSet_cjs
92
 
93
- export DATASET_ROOT="/admin/home-csteinmetz1/data/remfx-data"
94
- export WANDB_PROJECT="RemFX"
95
- export WANDB_ENTITY="cjstein"
96
 
97
- python scripts/train.py +exp=5-5.yaml model=cls_vggish render_files=False logs_dir=/scratch/cjs-log datamodule.batch_size=64
98
- python scripts/train.py +exp=5-5.yaml model=cls_panns_pt render_files=False logs_dir=/scratch/cjs-log datamodule.batch_size=64
99
- python scripts/train.py +exp=5-5.yaml model=cls_wav2vec2 render_files=False logs_dir=/scratch/cjs-log datamodule.batch_size=64
100
- python scripts/train.py +exp=5-5.yaml model=cls_wav2clip render_files=False logs_dir=/scratch/cjs-log datamodule.batch_size=64
101
- ```
102
 
103
- ### Installing HEAR models
104
 
105
- wav2clip
106
- ```
107
- pip install hearbaseline
108
- pip install git+https://github.com/hohsiangwu/wav2clip-hear.git
109
- pip install git+https://github.com/qiuqiangkong/HEAR2021_Challenge_PANNs
110
- wget https://zenodo.org/record/6332525/files/hear2021-panns_hear.pth
 
1
+ # General Purpose Audio Effect Removal
2
+ # About
3
+ TBD. Add photo. Add paper link.
4
  # Setup
5
+ ```
6
+ git clone https://github.com/mhrice/RemFx.git
7
+ git submodule update --init --recursive
8
+ pip install . umx
9
+ ```
10
+ # Usage
11
+ ## Run RemFX Detect on a single file
12
+ ```
13
+ ./download_checkpoints.sh
14
+ ./remfx_detect.sh wet.wav -o dry.wav
15
+ ```
16
+ ## Download the [General Purpose Audio Effect Removal evaluation dataset](https://zenodo.org/record/8183649/)
17
+ ```
18
+ wget https://zenodo.org/record/8183649/files/RemFX_eval_dataset.zip?download=1 -O RemFX_eval_dataset.zip
19
+ unzip RemFX_eval_dataset.zip
20
+ ```
21
 
22
+ ## Download the datasets used in the paper
23
+ ```
24
+ python scripts/download.py vocalset guitarset idmt-smt-guitar idmt-smt-bass idmt-smt-drums
25
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
 
 
27
 
28
+ ## Training
29
+ Before training, it is important that you have downloaded the datasets (see above).
30
+ This project uses [hydra](https://hydra.cc/) for configuration management. All experiments are defined in `cfg/exp/`. To train with an existing experiment, first run
31
+ ```
32
+ export DATASET_ROOT={path/to/datasets}
33
+ ```
34
+ Then:
35
+ ```
36
+ python scripts/train.py +exp={experiment_name}
37
+ ```
38
 
39
+ Here are some selected experiment types from the paper, which use different datasets and configurations. See `cfg/exp/` for a full list of experiments and parameters.
 
 
 
 
 
40
 
41
+ | Experiment Type | config name | example |
42
+ | ----------------------- | ------------ | ---------------- |
43
+ | Effect-specific | {effect} | +exp=chorus |
44
+ | Effect-specific + FXAug | {effect}_aug | +exp=chorus_aug |
45
+ | Monolithic (1 FX) | 5-5 | +exp=5-1 |
46
+ | Monolithic (<=5 FX) | 5-5 | +exp=5-5 |
47
+ | Classifier | 5-5_cls | +exp=5-5_cls |
48
 
49
+ To change the configuration, simply edit the experiment file, or override the configuration on the command line. A description of some of these variables is in the Misc. section below.
50
+ You can also create a custom experiment by creating a new experiment file in `cfg/exp/` and overriding the default parameters in `config.yaml`.
51
 
52
+ ## Evaluate models on the General Purpose Audio Effect Removal evaluation dataset
53
+ First download the dataset (see above).
54
+ To use the pretrained RemFX model, download the checkpoints
55
+ ```
56
+ ./download_checkpoints.sh
57
+ ```
58
+ Then run the evaluation script, select the RemFX configuration, between `remfx_oracle`, `remfx_detect`, and `remfx_all`.
59
+ ```
60
+ ./eval.sh remfx_detect
61
+ ```
62
+ To use a custom trained model, first train a model (see Training)
63
+ Then run the evaluation script, with config used.
64
+ ```
65
+ ./eval.sh {experiment_name}
66
+ ```
67
+
68
+ ## Checkpoints
69
+ Download checkpoints from [here](https://zenodo.org/record/8179396), or see the ./download_checkpoints.sh script.
70
+
71
+
72
+ ## Generate datasets used in the paper
73
+ ```
74
+ ```
75
+ Note that by default, files are rendered to `input_dir / processed / {string_of_effects} / {train|val|test}`.
76
+
77
+ ## Evaluate with a custom directory
78
  Assumes directory is structured as
79
  - root
80
  - clean
 
90
 
91
  `python scripts/chain_inference.py +exp=chain_inference_custom`
92
 
93
+ # Misc.
94
+ ## Experimental parameters
95
+ Some relevant training parameters descriptions
96
+ - `num_kept_effects={[min, max]}` range of <b> Kept </b> effects to apply to each file. Inclusive.
97
+ - `num_removed_effects={[min, max]}` range of <b> Removed </b> effects to apply to each file. Inclusive.
98
+ - `model={model}` architecture to use (see 'Models')
99
+ - `effects_to_keep={[effect]}` Effects to apply but not remove (see 'Effects')
100
+ - `effects_to_remove={[effect]}` Effects to remove (see 'Effects')
101
+ - `accelerator=null/'gpu'` Use GPU (1 device) (default: null)
102
+ - `render_files=True/False` Render files. Disable to skip rendering stage (default: True)
103
+ - `render_root={path/to/dir}`. Root directory to render files to (default: DATASET_ROOT)
104
 
105
+ ### Effect Removal Models
106
+ - `umx`
107
+ - `demucs`
108
+ - `tcn`
109
+ - `dcunet`
110
+ - `dptnet`
111
 
112
+ ### Effect Classification Models
113
+ - `cls_vggish`
114
+ - `cls_panns_pt`
115
+ - `cls_wav2vec2`
116
+ - `cls_wav2clip`
117
 
118
+ ### Effects
119
+ - `chorus`
120
+ - `compressor`
121
+ - `distortion`
122
+ - `reverb`
123
+ - `delay`
124
 
 
 
 
125
 
 
 
 
 
126
 
 
 
 
 
 
 
127
 
 
 
 
128
 
 
 
 
 
 
129
 
 
130
 
 
 
 
 
 
 
cfg/config.yaml CHANGED
@@ -97,20 +97,24 @@ datamodule:
97
  render_files: ${render_files}
98
  render_root: ${render_root}
99
 
100
- batch_size: 16
 
101
  num_workers: 8
102
  pin_memory: True
103
  persistent_workers: True
104
 
 
 
 
 
 
 
 
 
 
105
  logger:
106
- _target_: pytorch_lightning.loggers.WandbLogger
107
- project: ${oc.env:WANDB_PROJECT}
108
- entity: ${oc.env:WANDB_ENTITY}
109
- # offline: False # set True to store all logs only locally
110
- job_type: "train"
111
- group: ""
112
  save_dir: "."
113
- log_model: True
114
 
115
  trainer:
116
  _target_: pytorch_lightning.Trainer
 
97
  render_files: ${render_files}
98
  render_root: ${render_root}
99
 
100
+ train_batch_size: 16
101
+ test_batch_size: 1
102
  num_workers: 8
103
  pin_memory: True
104
  persistent_workers: True
105
 
106
+ # logger:
107
+ # _target_: pytorch_lightning.loggers.WandbLogger
108
+ # project: ${oc.env:WANDB_PROJECT}
109
+ # entity: ${oc.env:WANDB_ENTITY}
110
+ # # offline: False # set True to store all logs only locally
111
+ # job_type: "train"
112
+ # group: ""
113
+ # save_dir: "."
114
+ # log_model: True
115
  logger:
116
+ _target_: pytorch_lightning.loggers.CSVLogger
 
 
 
 
 
117
  save_dir: "."
 
118
 
119
  trainer:
120
  _target_: pytorch_lightning.Trainer
cfg/exp/1-1.yaml CHANGED
@@ -24,5 +24,6 @@ effects_to_remove:
24
  - chorus
25
  - delay
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
24
  - chorus
25
  - delay
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/2-2.yaml CHANGED
@@ -24,5 +24,6 @@ effects_to_remove:
24
  - chorus
25
  - delay
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
24
  - chorus
25
  - delay
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/3-3.yaml CHANGED
@@ -24,5 +24,6 @@ effects_to_remove:
24
  - chorus
25
  - delay
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
24
  - chorus
25
  - delay
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/4-4.yaml CHANGED
@@ -24,5 +24,6 @@ effects_to_remove:
24
  - chorus
25
  - delay
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
24
  - chorus
25
  - delay
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/5-1.yaml CHANGED
@@ -24,5 +24,6 @@ effects_to_remove:
24
  - chorus
25
  - delay
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
24
  - chorus
25
  - delay
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/5-5.yaml CHANGED
@@ -24,5 +24,6 @@ effects_to_remove:
24
  - chorus
25
  - delay
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
24
  - chorus
25
  - delay
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/5-5_cls.yaml CHANGED
@@ -1,6 +1,6 @@
1
  # @package _global_
2
  defaults:
3
- - override /model: demucs
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
 
1
  # @package _global_
2
  defaults:
3
+ - override /model: cls_panns_48k
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
cfg/exp/chain_inference.yaml CHANGED
@@ -23,7 +23,8 @@ effects_to_remove:
23
  - chorus
24
  - delay
25
  datamodule:
26
- batch_size: 16
 
27
  num_workers: 8
28
 
29
  dcunet:
 
23
  - chorus
24
  - delay
25
  datamodule:
26
+ train_batch_size: 16
27
+ test_batch_size: 1
28
  num_workers: 8
29
 
30
  dcunet:
cfg/exp/chain_inference_aug.yaml CHANGED
@@ -23,7 +23,8 @@ effects_to_remove:
23
  - chorus
24
  - delay
25
  datamodule:
26
- batch_size: 16
 
27
  num_workers: 8
28
 
29
  dcunet:
 
23
  - chorus
24
  - delay
25
  datamodule:
26
+ train_batch_size: 16
27
+ test_batch_size: 1
28
  num_workers: 8
29
 
30
  dcunet:
cfg/exp/chain_inference_aug_classifier.yaml CHANGED
@@ -23,7 +23,8 @@ effects_to_remove:
23
  - chorus
24
  - delay
25
  datamodule:
26
- batch_size: 16
 
27
  num_workers: 8
28
 
29
  dcunet:
@@ -56,7 +57,7 @@ classifier:
56
  n_mels: 128
57
  sample_rate: ${sample_rate}
58
  model_sample_rate: ${sample_rate}
59
- specaugment: False
60
  classifier_ckpt: "ckpts/classifier.ckpt"
61
 
62
  ckpts:
 
23
  - chorus
24
  - delay
25
  datamodule:
26
+ train_batch_size: 16
27
+ test_batch_size: 1
28
  num_workers: 8
29
 
30
  dcunet:
 
57
  n_mels: 128
58
  sample_rate: ${sample_rate}
59
  model_sample_rate: ${sample_rate}
60
+ specaugment: True
61
  classifier_ckpt: "ckpts/classifier.ckpt"
62
 
63
  ckpts:
cfg/exp/chain_inference_custom.yaml CHANGED
@@ -23,7 +23,8 @@ effects_to_remove:
23
  - chorus
24
  - delay
25
  datamodule:
26
- batch_size: 1
 
27
  num_workers: 8
28
  train_dataset: None
29
  val_dataset: None
 
23
  - chorus
24
  - delay
25
  datamodule:
26
+ train_batch_size: 1
27
+ test_batch_size: 1
28
  num_workers: 8
29
  train_dataset: None
30
  val_dataset: None
cfg/exp/chorus.yaml CHANGED
@@ -1,6 +1,6 @@
1
  # @package _global_
2
  defaults:
3
- - override /model: demucs
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
@@ -11,18 +11,15 @@ render_root: "/scratch/EffectSet"
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
- num_kept_effects: [0,4] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
- num_classes: 5
19
  effects_to_keep:
20
- - compressor
21
- - distortion
22
- - delay
23
- - reverb
24
  effects_to_remove:
25
  - chorus
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
1
  # @package _global_
2
  defaults:
3
+ - override /model: dcunet
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
 
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
+ num_kept_effects: [0,0] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
+ num_classes: 1
19
  effects_to_keep:
 
 
 
 
20
  effects_to_remove:
21
  - chorus
22
  datamodule:
23
+ train_batch_size: 16
24
+ test_batch_size: 1
25
  num_workers: 8
cfg/exp/{chorus_only.yaml β†’ chorus_aug.yaml} RENAMED
@@ -1,6 +1,6 @@
1
  # @package _global_
2
  defaults:
3
- - override /model: demucs
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
@@ -11,14 +11,19 @@ render_root: "/scratch/EffectSet"
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
- num_kept_effects: [0,0] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
- num_classes: 1
19
  effects_to_keep:
 
 
 
 
20
  effects_to_remove:
21
  - chorus
22
  datamodule:
23
- batch_size: 16
 
24
  num_workers: 8
 
1
  # @package _global_
2
  defaults:
3
+ - override /model: dcunet
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
 
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
+ num_kept_effects: [0,4] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
+ num_classes: 5
19
  effects_to_keep:
20
+ - compressor
21
+ - distortion
22
+ - delay
23
+ - reverb
24
  effects_to_remove:
25
  - chorus
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/compression.yaml CHANGED
@@ -11,18 +11,15 @@ render_root: "/scratch/EffectSet"
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
- num_kept_effects: [0,4] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
- num_classes: 5
19
  effects_to_keep:
20
- - distortion
21
- - chorus
22
- - delay
23
- - reverb
24
  effects_to_remove:
25
  - compressor
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
+ num_kept_effects: [0,0] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
+ num_classes: 1
19
  effects_to_keep:
 
 
 
 
20
  effects_to_remove:
21
  - compressor
22
  datamodule:
23
+ train_batch_size: 16
24
+ test_batch_size: 1
25
  num_workers: 8
cfg/exp/{compression_only.yaml β†’ compression_aug.yaml} RENAMED
@@ -11,14 +11,19 @@ render_root: "/scratch/EffectSet"
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
- num_kept_effects: [0,0] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
- num_classes: 1
19
  effects_to_keep:
 
 
 
 
20
  effects_to_remove:
21
  - compressor
22
  datamodule:
23
- batch_size: 16
 
24
  num_workers: 8
 
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
+ num_kept_effects: [0,4] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
+ num_classes: 5
19
  effects_to_keep:
20
+ - distortion
21
+ - chorus
22
+ - delay
23
+ - reverb
24
  effects_to_remove:
25
  - compressor
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/default.yaml CHANGED
@@ -24,5 +24,6 @@ effects_to_remove:
24
  - delay
25
  - distortion
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
24
  - delay
25
  - distortion
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/delay.yaml CHANGED
@@ -1,6 +1,6 @@
1
  # @package _global_
2
  defaults:
3
- - override /model: demucs
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
@@ -11,18 +11,15 @@ render_root: "/scratch/EffectSet"
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
- num_kept_effects: [0,4] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
- num_classes: 5
19
  effects_to_keep:
20
- - compressor
21
- - distortion
22
- - chorus
23
- - reverb
24
  effects_to_remove:
25
  - delay
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
1
  # @package _global_
2
  defaults:
3
+ - override /model: dcunet
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
 
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
+ num_kept_effects: [0,0] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
+ num_classes: 1
19
  effects_to_keep:
 
 
 
 
20
  effects_to_remove:
21
  - delay
22
  datamodule:
23
+ train_batch_size: 16
24
+ test_batch_size: 1
25
  num_workers: 8
cfg/exp/{reverb_only.yaml β†’ delay_aug.yaml} RENAMED
@@ -1,6 +1,6 @@
1
  # @package _global_
2
  defaults:
3
- - override /model: demucs
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
@@ -11,14 +11,19 @@ render_root: "/scratch/EffectSet"
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
- num_kept_effects: [0,0] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
- num_classes: 1
19
  effects_to_keep:
20
- effects_to_remove:
 
 
21
  - reverb
 
 
22
  datamodule:
23
- batch_size: 16
 
24
  num_workers: 8
 
1
  # @package _global_
2
  defaults:
3
+ - override /model: dcunet
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
 
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
+ num_kept_effects: [0,4] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
+ num_classes: 5
19
  effects_to_keep:
20
+ - compressor
21
+ - distortion
22
+ - chorus
23
  - reverb
24
+ effects_to_remove:
25
+ - delay
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/delay_only.yaml DELETED
@@ -1,24 +0,0 @@
1
- # @package _global_
2
- defaults:
3
- - override /model: demucs
4
- - override /effects: all
5
- seed: 12345
6
- sample_rate: 48000
7
- chunk_size: 262144 # 5.5s
8
- logs_dir: "./logs"
9
- render_files: True
10
- render_root: "/scratch/EffectSet"
11
- accelerator: "gpu"
12
- log_audio: True
13
- # Effects
14
- num_kept_effects: [0,0] # [min, max]
15
- num_removed_effects: [1,1] # [min, max]
16
- shuffle_kept_effects: True
17
- shuffle_removed_effects: False
18
- num_classes: 1
19
- effects_to_keep:
20
- effects_to_remove:
21
- - delay
22
- datamodule:
23
- batch_size: 16
24
- num_workers: 8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cfg/exp/distortion.yaml CHANGED
@@ -11,18 +11,15 @@ render_root: "/scratch/EffectSet"
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
- num_kept_effects: [0,4] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
- num_classes: 5
19
  effects_to_keep:
20
- - compressor
21
- - reverb
22
- - chorus
23
- - delay
24
  effects_to_remove:
25
  - distortion
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
+ num_kept_effects: [0,0] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
+ num_classes: 1
19
  effects_to_keep:
 
 
 
 
20
  effects_to_remove:
21
  - distortion
22
  datamodule:
23
+ train_batch_size: 16
24
+ test_batch_size: 1
25
  num_workers: 8
cfg/exp/{distortion_only.yaml β†’ distortion_aug.yaml} RENAMED
@@ -11,14 +11,19 @@ render_root: "/scratch/EffectSet"
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
- num_kept_effects: [0,0] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
- num_classes: 1
19
  effects_to_keep:
 
 
 
 
20
  effects_to_remove:
21
  - distortion
22
  datamodule:
23
- batch_size: 16
 
24
  num_workers: 8
 
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
+ num_kept_effects: [0,4] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
+ num_classes: 5
19
  effects_to_keep:
20
+ - compressor
21
+ - reverb
22
+ - chorus
23
+ - delay
24
  effects_to_remove:
25
  - distortion
26
  datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
  num_workers: 8
cfg/exp/remfx_all.yaml ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # @package _global_
2
+ defaults:
3
+ - override /model: demucs
4
+ - override /effects: all
5
+ seed: 12345
6
+ sample_rate: 48000
7
+ chunk_size: 262144 # 5.5s
8
+ logs_dir: "./logs"
9
+ accelerator: "cpu"
10
+ log_audio: True
11
+
12
+ # Effects
13
+ num_kept_effects: [0,0] # [min, max]
14
+ num_removed_effects: [0,5] # [min, max]
15
+ shuffle_kept_effects: True
16
+ shuffle_removed_effects: True
17
+ num_classes: 5
18
+ effects_to_keep:
19
+ effects_to_remove:
20
+ - compressor
21
+ - reverb
22
+ - chorus
23
+ - delay
24
+ - distortion
25
+ datamodule:
26
+ train_batch_size: 16
27
+ test_batch_size: 1
28
+ num_workers: 8
29
+
30
+ dcunet:
31
+ _target_: remfx.models.RemFX
32
+ lr: 1e-4
33
+ lr_beta1: 0.95
34
+ lr_beta2: 0.999
35
+ lr_eps: 1e-6
36
+ lr_weight_decay: 1e-3
37
+ sample_rate: ${sample_rate}
38
+ network:
39
+ _target_: remfx.models.DCUNetModel
40
+ architecture: "Large-DCUNet-20"
41
+ stft_kernel_size: 512
42
+ fix_length_mode: "pad"
43
+ sample_rate: ${sample_rate}
44
+ num_bins: 1025
45
+
46
+ classifier:
47
+ _target_: remfx.models.FXClassifier
48
+ lr: 3e-4
49
+ lr_weight_decay: 1e-3
50
+ sample_rate: ${sample_rate}
51
+ mixup: False
52
+ network:
53
+ _target_: remfx.classifier.Cnn14
54
+ num_classes: ${num_classes}
55
+ n_fft: 2048
56
+ hop_length: 512
57
+ n_mels: 128
58
+ sample_rate: ${sample_rate}
59
+ model_sample_rate: ${sample_rate}
60
+ specaugment: True
61
+ classifier_ckpt: "ckpts/classifier.ckpt"
62
+
63
+ ckpts:
64
+ RandomPedalboardDistortion:
65
+ model: ${model}
66
+ ckpt_path: "ckpts/demucs_distortion_aug.ckpt"
67
+ RandomPedalboardCompressor:
68
+ model: ${model}
69
+ ckpt_path: "ckpts/demucs_compressor_aug.ckpt"
70
+ RandomPedalboardReverb:
71
+ model: ${dcunet}
72
+ ckpt_path: "ckpts/dcunet_reverb_aug.ckpt"
73
+ RandomPedalboardChorus:
74
+ model: ${dcunet}
75
+ ckpt_path: "ckpts/dcunet_chorus_aug.ckpt"
76
+ RandomPedalboardDelay:
77
+ model: ${dcunet}
78
+ ckpt_path: "ckpts/dcunet_delay_aug.ckpt"
79
+
80
+ inference_effects_ordering:
81
+ - "RandomPedalboardDistortion"
82
+ - "RandomPedalboardCompressor"
83
+ - "RandomPedalboardReverb"
84
+ - "RandomPedalboardChorus"
85
+ - "RandomPedalboardDelay"
86
+ num_bins: 1025
87
+ inference_effects_shuffle: True
88
+ inference_use_all_effect_models: True
89
+ audio_input: ""
90
+ output_path: "./output.wav"
cfg/exp/remfx_detect.yaml ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # @package _global_
2
+ defaults:
3
+ - override /model: demucs
4
+ - override /effects: all
5
+ seed: 12345
6
+ sample_rate: 48000
7
+ chunk_size: 262144 # 5.5s
8
+ logs_dir: "./logs"
9
+ accelerator: "cpu"
10
+ log_audio: True
11
+
12
+ # Effects
13
+ num_kept_effects: [0,0] # [min, max]
14
+ num_removed_effects: [0,5] # [min, max]
15
+ shuffle_kept_effects: True
16
+ shuffle_removed_effects: True
17
+ num_classes: 5
18
+ effects_to_keep:
19
+ effects_to_remove:
20
+ - compressor
21
+ - reverb
22
+ - chorus
23
+ - delay
24
+ - distortion
25
+ datamodule:
26
+ train_batch_size: 16
27
+ test_batch_size: 1
28
+ num_workers: 8
29
+
30
+ dcunet:
31
+ _target_: remfx.models.RemFX
32
+ lr: 1e-4
33
+ lr_beta1: 0.95
34
+ lr_beta2: 0.999
35
+ lr_eps: 1e-6
36
+ lr_weight_decay: 1e-3
37
+ sample_rate: ${sample_rate}
38
+ network:
39
+ _target_: remfx.models.DCUNetModel
40
+ architecture: "Large-DCUNet-20"
41
+ stft_kernel_size: 512
42
+ fix_length_mode: "pad"
43
+ sample_rate: ${sample_rate}
44
+ num_bins: 1025
45
+
46
+ classifier:
47
+ _target_: remfx.models.FXClassifier
48
+ lr: 3e-4
49
+ lr_weight_decay: 1e-3
50
+ sample_rate: ${sample_rate}
51
+ mixup: False
52
+ network:
53
+ _target_: remfx.classifier.Cnn14
54
+ num_classes: ${num_classes}
55
+ n_fft: 2048
56
+ hop_length: 512
57
+ n_mels: 128
58
+ sample_rate: ${sample_rate}
59
+ model_sample_rate: ${sample_rate}
60
+ specaugment: True
61
+ classifier_ckpt: "ckpts/classifier.ckpt"
62
+
63
+ ckpts:
64
+ RandomPedalboardDistortion:
65
+ model: ${model}
66
+ ckpt_path: "ckpts/demucs_distortion_aug.ckpt"
67
+ RandomPedalboardCompressor:
68
+ model: ${model}
69
+ ckpt_path: "ckpts/demucs_compressor_aug.ckpt"
70
+ RandomPedalboardReverb:
71
+ model: ${dcunet}
72
+ ckpt_path: "ckpts/dcunet_reverb_aug.ckpt"
73
+ RandomPedalboardChorus:
74
+ model: ${dcunet}
75
+ ckpt_path: "ckpts/dcunet_chorus_aug.ckpt"
76
+ RandomPedalboardDelay:
77
+ model: ${dcunet}
78
+ ckpt_path: "ckpts/dcunet_delay_aug.ckpt"
79
+
80
+ inference_effects_ordering:
81
+ - "RandomPedalboardDistortion"
82
+ - "RandomPedalboardCompressor"
83
+ - "RandomPedalboardReverb"
84
+ - "RandomPedalboardChorus"
85
+ - "RandomPedalboardDelay"
86
+ num_bins: 1025
87
+ inference_effects_shuffle: True
88
+ inference_use_all_effect_models: False
89
+ audio_input: ""
90
+ output_path: "./output.wav"
cfg/exp/remfx_oracle.yaml ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # @package _global_
2
+ defaults:
3
+ - override /model: demucs
4
+ - override /effects: all
5
+ seed: 12345
6
+ sample_rate: 48000
7
+ chunk_size: 262144 # 5.5s
8
+ logs_dir: "./logs"
9
+ accelerator: "cpu"
10
+ log_audio: True
11
+
12
+ # Effects
13
+ num_kept_effects: [0,0] # [min, max]
14
+ num_removed_effects: [0,5] # [min, max]
15
+ shuffle_kept_effects: True
16
+ shuffle_removed_effects: True
17
+ num_classes: 5
18
+ effects_to_keep:
19
+ effects_to_remove:
20
+ - compressor
21
+ - reverb
22
+ - chorus
23
+ - delay
24
+ - distortion
25
+ datamodule:
26
+ train_batch_size: 16
27
+ test_batch_size: 1
28
+ num_workers: 8
29
+
30
+ dcunet:
31
+ _target_: remfx.models.RemFX
32
+ lr: 1e-4
33
+ lr_beta1: 0.95
34
+ lr_beta2: 0.999
35
+ lr_eps: 1e-6
36
+ lr_weight_decay: 1e-3
37
+ sample_rate: ${sample_rate}
38
+ network:
39
+ _target_: remfx.models.DCUNetModel
40
+ architecture: "Large-DCUNet-20"
41
+ stft_kernel_size: 512
42
+ fix_length_mode: "pad"
43
+ sample_rate: ${sample_rate}
44
+ num_bins: 1025
45
+
46
+
47
+ ckpts:
48
+ RandomPedalboardDistortion:
49
+ model: ${model}
50
+ ckpt_path: "ckpts/demucs_distortion_aug.ckpt"
51
+ RandomPedalboardCompressor:
52
+ model: ${model}
53
+ ckpt_path: "ckpts/demucs_compressor_aug.ckpt"
54
+ RandomPedalboardReverb:
55
+ model: ${dcunet}
56
+ ckpt_path: "ckpts/dcunet_reverb_aug.ckpt"
57
+ RandomPedalboardChorus:
58
+ model: ${dcunet}
59
+ ckpt_path: "ckpts/dcunet_chorus_aug.ckpt"
60
+ RandomPedalboardDelay:
61
+ model: ${dcunet}
62
+ ckpt_path: "ckpts/dcunet_delay_aug.ckpt"
63
+
64
+ inference_effects_ordering:
65
+ - "RandomPedalboardDistortion"
66
+ - "RandomPedalboardCompressor"
67
+ - "RandomPedalboardReverb"
68
+ - "RandomPedalboardChorus"
69
+ - "RandomPedalboardDelay"
70
+ num_bins: 1025
71
+ inference_effects_shuffle: True
72
+ inference_use_all_effect_models: False
73
+ audio_input: ""
74
+ output_path: "./output.wav"
cfg/exp/reverb.yaml CHANGED
@@ -1,6 +1,6 @@
1
  # @package _global_
2
  defaults:
3
- - override /model: demucs
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
@@ -11,18 +11,15 @@ render_root: "/scratch/EffectSet"
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
- num_kept_effects: [0,4] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
- num_classes: 5
19
  effects_to_keep:
20
- - compressor
21
- - distortion
22
- - chorus
23
- - delay
24
  effects_to_remove:
25
  - reverb
26
  datamodule:
27
- batch_size: 16
 
28
  num_workers: 8
 
1
  # @package _global_
2
  defaults:
3
+ - override /model: dcunet
4
  - override /effects: all
5
  seed: 12345
6
  sample_rate: 48000
 
11
  accelerator: "gpu"
12
  log_audio: True
13
  # Effects
14
+ num_kept_effects: [0,0] # [min, max]
15
  num_removed_effects: [1,1] # [min, max]
16
  shuffle_kept_effects: True
17
  shuffle_removed_effects: False
18
+ num_classes: 1
19
  effects_to_keep:
 
 
 
 
20
  effects_to_remove:
21
  - reverb
22
  datamodule:
23
+ train_batch_size: 16
24
+ test_batch_size: 1
25
  num_workers: 8
cfg/exp/reverb_aug.yaml ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # @package _global_
2
+ defaults:
3
+ - override /model: dcunet
4
+ - override /effects: all
5
+ seed: 12345
6
+ sample_rate: 48000
7
+ chunk_size: 262144 # 5.5s
8
+ logs_dir: "./logs"
9
+ render_files: True
10
+ render_root: "/scratch/EffectSet"
11
+ accelerator: "gpu"
12
+ log_audio: True
13
+ # Effects
14
+ num_kept_effects: [0,4] # [min, max]
15
+ num_removed_effects: [1,1] # [min, max]
16
+ shuffle_kept_effects: True
17
+ shuffle_removed_effects: False
18
+ num_classes: 5
19
+ effects_to_keep:
20
+ - compressor
21
+ - distortion
22
+ - chorus
23
+ - delay
24
+ effects_to_remove:
25
+ - reverb
26
+ datamodule:
27
+ train_batch_size: 16
28
+ test_batch_size: 1
29
+ num_workers: 8
cfg/model/audio_diffusion.yaml DELETED
@@ -1,16 +0,0 @@
1
- # @package _global_
2
- model:
3
- _target_: remfx.models.RemFX
4
- lr: 1e-4
5
- lr_beta1: 0.95
6
- lr_beta2: 0.999
7
- lr_eps: 1e-6
8
- lr_weight_decay: 1e-3
9
- sample_rate: ${sample_rate}
10
- network:
11
- _target_: remfx.models.DiffusionGenerationModel
12
- n_channels: 1
13
- datamodule:
14
- dataset:
15
- effect_types: ["Clean"]
16
- batch_size: 2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
diffusion_test2.ipynb DELETED
@@ -1,188 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "code",
5
- "execution_count": 27,
6
- "id": "4c52cc1c-91f1-4b79-924b-041d2929ef7b",
7
- "metadata": {},
8
- "outputs": [],
9
- "source": [
10
- "from audio_diffusion_pytorch import AudioDiffusionModel\n",
11
- "import torch\n",
12
- "from IPython.display import Audio\n",
13
- "import matplotlib.pyplot as plt\n",
14
- "from tqdm import tqdm\n",
15
- "import numpy as np"
16
- ]
17
- },
18
- {
19
- "cell_type": "code",
20
- "execution_count": 28,
21
- "id": "a005011f-3019-4d34-bdf2-9a00e5480282",
22
- "metadata": {},
23
- "outputs": [],
24
- "source": [
25
- "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")"
26
- ]
27
- },
28
- {
29
- "cell_type": "code",
30
- "execution_count": 29,
31
- "id": "1b689f18-375f-4b40-9ddc-a4ced6a5e5e4",
32
- "metadata": {},
33
- "outputs": [],
34
- "source": [
35
- "model = AudioDiffusionModel(in_channels=1, \n",
36
- " patch_size=1,\n",
37
- " multipliers=[1, 2, 4, 4, 4, 4, 4],\n",
38
- " factors=[2, 2, 2, 2, 2, 2],\n",
39
- " num_blocks=[2, 2, 2, 2, 2, 2],\n",
40
- " attentions=[0, 0, 0, 0, 0, 0]\n",
41
- " )\n",
42
- "model = model.to(device)"
43
- ]
44
- },
45
- {
46
- "cell_type": "code",
47
- "execution_count": 30,
48
- "id": "bd8a1cb4-42b5-43bc-9a12-f594ce069b33",
49
- "metadata": {},
50
- "outputs": [
51
- {
52
- "name": "stdout",
53
- "output_type": "stream",
54
- "text": [
55
- "torch.Size([1, 32768])\n"
56
- ]
57
- }
58
- ],
59
- "source": [
60
- "fs = 22050\n",
61
- "t = 32768\n",
62
- "fc_min = 220\n",
63
- "fc_max = 440\n",
64
- "batch_size = 8\n",
65
- "samples = torch.arange(t) / fs\n",
66
- "n_iters = 1000\n",
67
- "\n",
68
- "samples = samples.view(1, -1)\n",
69
- "print(samples.shape)\n",
70
- "\n",
71
- "lr = 1e-4\n",
72
- "optimizer = torch.optim.Adam(model.parameters(), lr, betas=(0.95, 0.999), eps=1e-6, weight_decay=1e-3)"
73
- ]
74
- },
75
- {
76
- "cell_type": "code",
77
- "execution_count": 31,
78
- "id": "01265072",
79
- "metadata": {
80
- "scrolled": true
81
- },
82
- "outputs": [
83
- {
84
- "name": "stderr",
85
- "output_type": "stream",
86
- "text": [
87
- "999 - loss step: 0.0457 loss mean: 0.1161: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1000/1000 [09:38<00:00, 1.73it/s]\n"
88
- ]
89
- }
90
- ],
91
- "source": [
92
- "losses = []\n",
93
- "pbar = tqdm(range(n_iters))\n",
94
- "for i in pbar:\n",
95
- " \n",
96
- " optimizer.zero_grad()\n",
97
- " \n",
98
- " # create a batch of random sine waves\n",
99
- " f = torch.randint(fc_min, fc_max, [batch_size,1])\n",
100
- " signals = torch.sin(2 * torch.pi * f * samples)\n",
101
- " signals = signals.view(batch_size, 1, -1)\n",
102
- " signals = signals.to(device)\n",
103
- "\n",
104
- " loss = model(signals)\n",
105
- " loss.backward() \n",
106
- " optimizer.step()\n",
107
- " \n",
108
- " losses.append(loss.item())\n",
109
- " pbar.set_description(f\"{i} - loss step: {loss.item():0.4f} loss mean: {np.mean(losses):0.4f}\")"
110
- ]
111
- },
112
- {
113
- "cell_type": "code",
114
- "execution_count": 38,
115
- "id": "71d17c51-842c-40a1-81a1-a53bf358bc8a",
116
- "metadata": {},
117
- "outputs": [],
118
- "source": [
119
- "# Sample 2 sources given start noise\n",
120
- "noise = torch.randn(1, 1, t)\n",
121
- "noise = noise.to(device)\n",
122
- "sampled = model.sample(\n",
123
- " noise=noise,\n",
124
- " num_steps=50 # Suggested range: 2-50\n",
125
- ") # [2, 1, 2 ** 18]"
126
- ]
127
- },
128
- {
129
- "cell_type": "code",
130
- "execution_count": 39,
131
- "id": "59d71efa-05ac-4545-84da-8c09c033dfd7",
132
- "metadata": {},
133
- "outputs": [
134
- {
135
- "data": {
136
- "text/html": [
137
- "\n",
138
- " <audio controls=\"controls\" >\n",
139
- " <source src=\"data:audio/wav;base64,\" type=\"audio/wav\" />\n",
140
- " Your browser does not support the audio element.\n",
141
- " </audio>\n",
142
- " "
143
- ],
144
- "text/plain": [
145
- "<IPython.lib.display.Audio object>"
146
- ]
147
- },
148
- "execution_count": 39,
149
- "metadata": {},
150
- "output_type": "execute_result"
151
- }
152
- ],
153
- "source": [
154
- "z = sampled[0]\n",
155
- "Audio(z.cpu(), rate=22050)"
156
- ]
157
- },
158
- {
159
- "cell_type": "code",
160
- "execution_count": null,
161
- "id": "81eddd71-bba7-4c62-8d50-900b295bb2f8",
162
- "metadata": {},
163
- "outputs": [],
164
- "source": []
165
- }
166
- ],
167
- "metadata": {
168
- "kernelspec": {
169
- "display_name": "Python 3 (ipykernel)",
170
- "language": "python",
171
- "name": "python3"
172
- },
173
- "language_info": {
174
- "codemirror_mode": {
175
- "name": "ipython",
176
- "version": 3
177
- },
178
- "file_extension": ".py",
179
- "mimetype": "text/x-python",
180
- "name": "python",
181
- "nbconvert_exporter": "python",
182
- "pygments_lexer": "ipython3",
183
- "version": "3.9.5"
184
- }
185
- },
186
- "nbformat": 4,
187
- "nbformat_minor": 5
188
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
download_ckpts.sh ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # make ckpts directory if not exist
2
+ mkdir -p ckpts
3
+
4
+ # download ckpts and save to ckpts directory
5
+ wget https://zenodo.org/record/8179396/files/classifier.ckpt?download=1 -O ckpts/classifier.ckpt
6
+ wget https://zenodo.org/record/8179396/files/dcunet_chorus_aug.ckpt?download=1 -O ckpts/dcunet_chorus_aug.ckpt
7
+ wget https://zenodo.org/record/8179396/files/dcunet_delay_aug.ckpt?download=1 -O ckpts/dcunet_delay_aug.ckpt
8
+ wget https://zenodo.org/record/8179396/files/dcunet_reverb_aug.ckpt?download=1 -O ckpts/dcunet_reverb_aug.ckpt
9
+ wget https://zenodo.org/record/8179396/files/demucs_compressor_aug.ckpt?download=1 -O ckpts/demucs_compressor_aug.ckpt
10
+ wget https://zenodo.org/record/8179396/files/demucs_distortion_aug.ckpt?download=1 -O ckpts/demucs_distortion_aug.ckpt
eval.sh ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #! /bin/bash
2
+
3
+ # Example usage:
4
+ # ./eval.sh remfx_detect
5
+
6
+ # Check if first argument is empty
7
+ if [ -z "$1" ]
8
+ then
9
+ echo "No experiment name or config path supplied"
10
+ exit 1
11
+ fi
12
+
13
+ python scripts/chain_inference.py +exp=$1 datamodule.train_dataset=None datamodule.val_dataset=None datamodule.test_dataset.render_root=./RemFX_eval_dataset/ render_files=False
14
+
15
+
16
+
remfx/callbacks.py CHANGED
@@ -92,6 +92,8 @@ def log_wandb_audio_batch(
92
  caption: str = "",
93
  max_items: int = 10,
94
  ):
 
 
95
  num_items = samples.shape[0]
96
  samples = rearrange(samples, "b c t -> b t c")
97
  for idx in range(num_items):
 
92
  caption: str = "",
93
  max_items: int = 10,
94
  ):
95
+ if type(logger) != pl.loggers.WandbLogger:
96
+ return
97
  num_items = samples.shape[0]
98
  samples = rearrange(samples, "b c t -> b t c")
99
  for idx in range(num_items):
remfx/classifier.py CHANGED
@@ -173,10 +173,10 @@ class Cnn14(nn.Module):
173
 
174
  self.fc1 = nn.Linear(2048, 2048, bias=True)
175
 
176
- # self.fc_audioset = nn.Linear(2048, num_classes, bias=True)
177
- self.heads = torch.nn.ModuleList()
178
- for _ in range(num_classes):
179
- self.heads.append(nn.Linear(2048, 1, bias=True))
180
 
181
  self.init_weight()
182
 
@@ -192,7 +192,7 @@ class Cnn14(nn.Module):
192
  def init_weight(self):
193
  init_bn(self.bn0)
194
  init_layer(self.fc1)
195
- # init_layer(self.fc_audioset)
196
 
197
  def forward(self, x: torch.Tensor, train: bool = False):
198
  """
@@ -212,12 +212,12 @@ class Cnn14(nn.Module):
212
  # axs[1].imshow(x[0, :, :, :].detach().squeeze().cpu().numpy())
213
  # plt.savefig("spec_augment.png", dpi=300)
214
 
215
- # x = x.permute(0, 2, 1, 3)
216
- # x = self.bn0(x)
217
- # x = x.permute(0, 2, 1, 3)
218
 
219
  # apply standardization
220
- x = (x - x.mean(dim=0, keepdim=True)) / x.std(dim=0, keepdim=True)
221
 
222
  x = self.conv_block1(x, pool_size=(2, 2), pool_type="avg")
223
  x = F.dropout(x, p=0.2, training=train)
@@ -239,13 +239,13 @@ class Cnn14(nn.Module):
239
  x = F.dropout(x, p=0.5, training=train)
240
  x = F.relu_(self.fc1(x))
241
 
242
- outputs = []
243
- for head in self.heads:
244
- outputs.append(torch.sigmoid(head(x)))
245
 
246
- # clipwise_output = self.fc_audioset(x)
247
-
248
- return outputs
249
 
250
 
251
  class ConvBlock(nn.Module):
 
173
 
174
  self.fc1 = nn.Linear(2048, 2048, bias=True)
175
 
176
+ self.fc_audioset = nn.Linear(2048, num_classes, bias=True)
177
+ # self.heads = torch.nn.ModuleList()
178
+ # for _ in range(num_classes):
179
+ # self.heads.append(nn.Linear(2048, 1, bias=True))
180
 
181
  self.init_weight()
182
 
 
192
  def init_weight(self):
193
  init_bn(self.bn0)
194
  init_layer(self.fc1)
195
+ init_layer(self.fc_audioset)
196
 
197
  def forward(self, x: torch.Tensor, train: bool = False):
198
  """
 
212
  # axs[1].imshow(x[0, :, :, :].detach().squeeze().cpu().numpy())
213
  # plt.savefig("spec_augment.png", dpi=300)
214
 
215
+ x = x.permute(0, 2, 1, 3)
216
+ x = self.bn0(x)
217
+ x = x.permute(0, 2, 1, 3)
218
 
219
  # apply standardization
220
+ # x = (x - x.mean(dim=0, keepdim=True)) / x.std(dim=0, keepdim=True)
221
 
222
  x = self.conv_block1(x, pool_size=(2, 2), pool_type="avg")
223
  x = F.dropout(x, p=0.2, training=train)
 
239
  x = F.dropout(x, p=0.5, training=train)
240
  x = F.relu_(self.fc1(x))
241
 
242
+ # outputs = []
243
+ # for head in self.heads:
244
+ # outputs.append(torch.sigmoid(head(x)))
245
 
246
+ clipwise_output = self.fc_audioset(x)
247
+ return clipwise_output
248
+ # return outputs
249
 
250
 
251
  class ConvBlock(nn.Module):
remfx/datasets.py CHANGED
@@ -18,7 +18,6 @@ from auraloss.freq import MultiResolutionSTFTLoss
18
 
19
  STFT_THRESH = 1e-3
20
  ALL_EFFECTS = effect_lib.Pedalboard_Effects
21
- # print(ALL_EFFECTS)
22
 
23
 
24
  vocalset_splits = {
 
18
 
19
  STFT_THRESH = 1e-3
20
  ALL_EFFECTS = effect_lib.Pedalboard_Effects
 
21
 
22
 
23
  vocalset_splits = {
remfx/models.py CHANGED
@@ -51,7 +51,7 @@ class RemFXChainInference(pl.LightningModule):
51
  self.output_str = "IN_SISDR,OUT_SISDR,IN_STFT,OUT_STFT\n"
52
  self.use_all_effect_models = use_all_effect_models
53
 
54
- def forward(self, batch, batch_idx, order=None):
55
  x, y, _, rem_fx_labels = batch
56
  # Use chain of effects defined in config
57
  if order:
@@ -79,25 +79,19 @@ class RemFXChainInference(pl.LightningModule):
79
  ]
80
  for effect_label in rem_fx_labels
81
  ]
 
 
 
 
 
 
 
 
 
 
 
82
 
83
  output = []
84
- # input_samples = rearrange(x, "b c t -> c (b t)").unsqueeze(0)
85
- # target_samples = rearrange(y, "b c t -> c (b t)").unsqueeze(0)
86
-
87
- # log_wandb_audio_batch(
88
- # logger=self.logger,
89
- # id="input_effected_audio",
90
- # samples=input_samples.cpu(),
91
- # sampling_rate=self.sample_rate,
92
- # caption="Input Data",
93
- # )
94
- # log_wandb_audio_batch(
95
- # logger=self.logger,
96
- # id="target_audio",
97
- # samples=target_samples.cpu(),
98
- # sampling_rate=self.sample_rate,
99
- # caption="Target Data",
100
- # )
101
  with torch.no_grad():
102
  for i, (elem, effects_list) in enumerate(zip(x, effects_present)):
103
  elem = elem.unsqueeze(0) # Add batch dim
@@ -107,40 +101,12 @@ class RemFXChainInference(pl.LightningModule):
107
  effect for effect in effects_order if effect in effect_list_names
108
  ]
109
 
110
- # log_wandb_audio_batch(
111
- # logger=self.logger,
112
- # id=f"{i}_Before",
113
- # samples=elem.cpu(),
114
- # sampling_rate=self.sample_rate,
115
- # caption=effects,
116
- # )
117
  for effect in effects:
118
  # Sample the model
119
  elem = self.model[effect].model.sample(elem)
120
- # log_wandb_audio_batch(
121
- # logger=self.logger,
122
- # id=f"{i}_{effect}",
123
- # samples=elem.cpu(),
124
- # sampling_rate=self.sample_rate,
125
- # caption=effects,
126
- # )
127
- # log_wandb_audio_batch(
128
- # logger=self.logger,
129
- # id=f"{i}_After",
130
- # samples=elem.cpu(),
131
- # sampling_rate=self.sample_rate,
132
- # caption=effects,
133
- # )
134
  output.append(elem.squeeze(0))
135
  output = torch.stack(output)
136
 
137
- # log_wandb_audio_batch(
138
- # logger=self.logger,
139
- # id="output_audio",
140
- # samples=output_samples.cpu(),
141
- # sampling_rate=self.sample_rate,
142
- # caption="Output Data",
143
- # )
144
  loss = self.mrstftloss(output, y) + self.l1loss(output, y) * 100
145
  return loss, output
146
 
 
51
  self.output_str = "IN_SISDR,OUT_SISDR,IN_STFT,OUT_STFT\n"
52
  self.use_all_effect_models = use_all_effect_models
53
 
54
+ def forward(self, batch, batch_idx, order=None, verbose=False):
55
  x, y, _, rem_fx_labels = batch
56
  # Use chain of effects defined in config
57
  if order:
 
79
  ]
80
  for effect_label in rem_fx_labels
81
  ]
82
+ effects_present_name = [
83
+ [
84
+ ALL_EFFECTS[i].__name__
85
+ for i, effect in enumerate(effect_label)
86
+ if effect == 1.0
87
+ ]
88
+ for effect_label in rem_fx_labels
89
+ ]
90
+ if verbose:
91
+ print("Detected effects:", effects_present_name[0])
92
+ print("Removing effects...")
93
 
94
  output = []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  with torch.no_grad():
96
  for i, (elem, effects_list) in enumerate(zip(x, effects_present)):
97
  elem = elem.unsqueeze(0) # Add batch dim
 
101
  effect for effect in effects_order if effect in effect_list_names
102
  ]
103
 
 
 
 
 
 
 
 
104
  for effect in effects:
105
  # Sample the model
106
  elem = self.model[effect].model.sample(elem)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  output.append(elem.squeeze(0))
108
  output = torch.stack(output)
109
 
 
 
 
 
 
 
 
110
  loss = self.mrstftloss(output, y) + self.l1loss(output, y) * 100
111
  return loss, output
112
 
remfx/utils.py CHANGED
@@ -52,6 +52,9 @@ def log_hyperparameters(
52
  if not trainer.logger:
53
  return
54
 
 
 
 
55
  hparams = {}
56
 
57
  # choose which parts of hydra config will be saved to loggers
 
52
  if not trainer.logger:
53
  return
54
 
55
+ if type(trainer.logger) == pl.loggers.CSVLogger:
56
+ return
57
+
58
  hparams = {}
59
 
60
  # choose which parts of hydra config will be saved to loggers
remfx_detect.sh ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #! /bin/bash
2
+
3
+ # Example usage:
4
+ # ./remfx_detect.sh wet.wav -o examples/output.wav
5
+ # first argument is required, second argument is optional
6
+
7
+ # Check if first argument is empty
8
+ if [ -z "$1" ]
9
+ then
10
+ echo "No audio input path supplied"
11
+ exit 1
12
+ fi
13
+
14
+ audio_input=$1
15
+ # Shift first argument away
16
+ shift
17
+ output_path=""
18
+
19
+ while getopts ":o:" opt; do
20
+ case $opt in
21
+ o)
22
+ output_path=$OPTARG
23
+ ;;
24
+ \?)
25
+ echo "Invalid option: -$OPTARG" >&2
26
+ ;;
27
+ esac
28
+ done
29
+
30
+
31
+ # Run script
32
+ # If output path is blank, leave it blank
33
+
34
+ if [ -z "$output_path" ]
35
+ then
36
+ python scripts/remfx_detect.py +exp=remfx_detect audio_input=$1
37
+ exit 0
38
+ fi
39
+ python scripts/remfx_detect.py +exp=remfx_detect audio_input=$1 output_path=$output_path
scripts/remfx_detect.py ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import hydra
2
+ from omegaconf import DictConfig
3
+ import torch
4
+ from remfx.models import RemFXChainInference
5
+ import torchaudio
6
+
7
+
8
+ @hydra.main(
9
+ version_base=None,
10
+ config_path="../cfg",
11
+ config_name="config.yaml",
12
+ )
13
+ def main(cfg: DictConfig):
14
+ print("Loading models...")
15
+ models = {}
16
+ for effect in cfg.ckpts:
17
+ model = hydra.utils.instantiate(cfg.ckpts[effect].model, _convert_="partial")
18
+ ckpt_path = cfg.ckpts[effect].ckpt_path
19
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
20
+ state_dict = torch.load(ckpt_path, map_location=device)["state_dict"]
21
+ model.load_state_dict(state_dict)
22
+ model.to(device)
23
+ models[effect] = model
24
+
25
+ classifier = hydra.utils.instantiate(cfg.classifier, _convert_="partial")
26
+ ckpt_path = cfg.classifier_ckpt
27
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
28
+ state_dict = torch.load(ckpt_path, map_location=device)["state_dict"]
29
+ classifier.load_state_dict(state_dict)
30
+ classifier.to(device)
31
+
32
+ inference_model = RemFXChainInference(
33
+ models,
34
+ sample_rate=cfg.sample_rate,
35
+ num_bins=cfg.num_bins,
36
+ effect_order=cfg.inference_effects_ordering,
37
+ classifier=classifier,
38
+ shuffle_effect_order=cfg.inference_effects_shuffle,
39
+ use_all_effect_models=cfg.inference_use_all_effect_models,
40
+ )
41
+
42
+ audio_file = "/Users/matthewrice/Desktop/clips/chipmunk.wav"
43
+ print("Loading", audio_file)
44
+ audio, sr = torchaudio.load(audio_file)
45
+ # Resample
46
+ audio = torchaudio.transforms.Resample(sr, cfg.sample_rate)(audio)
47
+ # Convert to mono
48
+ audio = audio.mean(0, keepdim=True)
49
+ # Add dimension for batch
50
+ audio = audio.unsqueeze(0)
51
+ batch = [audio, audio, None, None]
52
+
53
+ _, y = inference_model(batch, 0, verbose=True)
54
+ print("Saving output to", cfg.output_path)
55
+ torchaudio.save(cfg.output_path, y[0], sample_rate=cfg.sample_rate)
56
+
57
+
58
+ if __name__ == "__main__":
59
+ main()
setup.py CHANGED
@@ -3,7 +3,7 @@ from setuptools import setup, find_packages
3
 
4
  NAME = "remfx"
5
  DESCRIPTION = "Universal audio effect removal"
6
- URL = ""
7
  EMAIL = "m.rice@se22.qmul.ac.uk"
8
  AUTHOR = "Matthew Rice"
9
  REQUIRES_PYTHON = ">=3.8.0"
 
3
 
4
  NAME = "remfx"
5
  DESCRIPTION = "Universal audio effect removal"
6
+ URL = "https://github.com/mhrice/RemFx"
7
  EMAIL = "m.rice@se22.qmul.ac.uk"
8
  AUTHOR = "Matthew Rice"
9
  REQUIRES_PYTHON = ">=3.8.0"