Hugo Flores Garcia commited on
Commit
4c17dbe
1 Parent(s): 9567041
Files changed (43) hide show
  1. runs/boleros/c2f/args.yml +825 -0
  2. runs/boleros/c2f/latest/vampnet/weights.pth +3 -0
  3. runs/boleros/c2f/model.txt +76 -0
  4. runs/boleros/coarse/args.yml +825 -0
  5. runs/boleros/coarse/latest/vampnet/weights.pth +3 -0
  6. runs/boleros/coarse/model.txt +76 -0
  7. runs/choir/c2f/latest/vampnet/weights.pth +3 -0
  8. runs/choir/coarse/latest/vampnet/weights.pth +3 -0
  9. runs/knower/c2f/args.yml +824 -0
  10. runs/knower/c2f/best/vampnet/weights.pth +3 -0
  11. runs/knower/c2f/latest/vampnet/weights.pth +3 -0
  12. runs/knower/c2f/model.txt +76 -0
  13. runs/knower/coarse/args.yml +824 -0
  14. runs/knower/coarse/best/vampnet/weights.pth +3 -0
  15. runs/knower/coarse/latest/vampnet/weights.pth +3 -0
  16. runs/knower/coarse/model.txt +76 -0
  17. runs/n64/c2f/args.yml +129 -0
  18. runs/n64/c2f/latest/vampnet/weights.pth +3 -0
  19. runs/n64/c2f/model.txt +76 -0
  20. runs/n64/coarse/args.yml +129 -0
  21. runs/n64/coarse/latest/vampnet/weights.pth +3 -0
  22. runs/n64/coarse/model.txt +76 -0
  23. runs/n64/n64/c2f/vampnet/weights.pth +3 -0
  24. runs/n64/n64/coarse/latest/vampnet/weights.pth +3 -0
  25. runs/opera/coarse/latest/vampnet/weights.pth +3 -0
  26. runs/orchestral/c2f/args.yml +129 -0
  27. runs/orchestral/c2f/latest/vampnet/weights.pth +3 -0
  28. runs/orchestral/c2f/model.txt +76 -0
  29. runs/orchestral/coarse/args.yml +129 -0
  30. runs/orchestral/coarse/latest/vampnet/weights.pth +3 -0
  31. runs/orchestral/coarse/model.txt +76 -0
  32. runs/soundrangers-v2-v1/c2f/args.yml +851 -0
  33. runs/soundrangers-v2-v1/c2f/latest/vampnet/weights.pth +3 -0
  34. runs/soundrangers-v2-v1/c2f/model.txt +73 -0
  35. runs/soundrangers-v2-v1/coarse/args.yml +851 -0
  36. runs/soundrangers-v2-v1/coarse/latest/vampnet/weights.pth +3 -0
  37. runs/soundrangers-v2-v1/coarse/model.txt +73 -0
  38. runs/soundrangers-v2/c2f/args.yml +155 -0
  39. runs/soundrangers-v2/c2f/latest/vampnet/weights.pth +3 -0
  40. runs/soundrangers-v2/c2f/model.txt +76 -0
  41. runs/soundrangers-v2/coarse/args.yml +155 -0
  42. runs/soundrangers-v2/coarse/latest/vampnet/weights.pth +3 -0
  43. runs/soundrangers-v2/coarse/model.txt +76 -0
runs/boleros/c2f/args.yml ADDED
@@ -0,0 +1,825 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 3.0
14
+ AudioDataset.loudness_cutoff: -40.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: ''
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ BackgroundNoise.eq_amount: !!python/tuple
33
+ - const
34
+ - 1.0
35
+ BackgroundNoise.loudness_cutoff: null
36
+ BackgroundNoise.n_bands: 3
37
+ BackgroundNoise.name: null
38
+ BackgroundNoise.prob: 1.0
39
+ BackgroundNoise.snr: !!python/tuple
40
+ - uniform
41
+ - 10.0
42
+ - 30.0
43
+ BackgroundNoise.sources: null
44
+ BackgroundNoise.weights: null
45
+
46
+ BaseTransform.keys: []
47
+ BaseTransform.name: null
48
+ BaseTransform.prob: 1.0
49
+
50
+ ClippingDistortion.name: null
51
+ ClippingDistortion.perc: !!python/tuple
52
+ - uniform
53
+ - 0.0
54
+ - 0.1
55
+ ClippingDistortion.prob: 1.0
56
+
57
+ CorruptPhase.name: null
58
+ CorruptPhase.prob: 1
59
+ CorruptPhase.scale: !!python/tuple
60
+ - uniform
61
+ - 0
62
+ - 3.141592653589793
63
+
64
+ CrossEntropyLoss.ignore_index: -100
65
+ CrossEntropyLoss.label_smoothing: 0.1
66
+ CrossEntropyLoss.reduce: null
67
+ CrossEntropyLoss.reduction: mean
68
+ CrossEntropyLoss.size_average: null
69
+
70
+ CrossTalk.loudness_cutoff: -40
71
+ CrossTalk.name: null
72
+ CrossTalk.prob: 1.0
73
+ CrossTalk.snr: !!python/tuple
74
+ - uniform
75
+ - 0.0
76
+ - 10.0
77
+ CrossTalk.sources: null
78
+ CrossTalk.weights: null
79
+
80
+ Equalizer.eq_amount: !!python/tuple
81
+ - const
82
+ - 1.0
83
+ Equalizer.n_bands: 6
84
+ Equalizer.name: null
85
+ Equalizer.prob: 1.0
86
+
87
+ FrequencyMask.f_center: !!python/tuple
88
+ - uniform
89
+ - 0.0
90
+ - 1.0
91
+ FrequencyMask.f_width: !!python/tuple
92
+ - const
93
+ - 0.1
94
+ FrequencyMask.name: null
95
+ FrequencyMask.prob: 1
96
+
97
+ FrequencyNoise.f_center: !!python/tuple
98
+ - uniform
99
+ - 0.0
100
+ - 1.0
101
+ FrequencyNoise.f_width: !!python/tuple
102
+ - const
103
+ - 0.1
104
+ FrequencyNoise.name: null
105
+ FrequencyNoise.prob: 1
106
+
107
+ GlobalVolumeNorm.db: !!python/tuple
108
+ - const
109
+ - -24
110
+ GlobalVolumeNorm.name: null
111
+ GlobalVolumeNorm.prob: 1.0
112
+
113
+ HighPass.cutoff: !!python/tuple
114
+ - choice
115
+ - - 50
116
+ - 100
117
+ - 250
118
+ - 500
119
+ - 1000
120
+ HighPass.name: null
121
+ HighPass.prob: 1
122
+ HighPass.zeros: 51
123
+
124
+ InvertPhase.name: null
125
+ InvertPhase.prob: 1
126
+
127
+ LowPass.cutoff: !!python/tuple
128
+ - choice
129
+ - - 4000
130
+ - 8000
131
+ - 16000
132
+ LowPass.name: null
133
+ LowPass.prob: 1
134
+ LowPass.zeros: 51
135
+
136
+ MaskLowMagnitudes.db_cutoff: !!python/tuple
137
+ - uniform
138
+ - -10
139
+ - 10
140
+ MaskLowMagnitudes.name: null
141
+ MaskLowMagnitudes.prob: 1
142
+
143
+ MuLawQuantization.channels: !!python/tuple
144
+ - choice
145
+ - - 8
146
+ - 32
147
+ - 128
148
+ - 256
149
+ - 1024
150
+ MuLawQuantization.name: null
151
+ MuLawQuantization.prob: 1.0
152
+
153
+ NoamScheduler.d_model: 512
154
+ NoamScheduler.factor: 2.0
155
+ NoamScheduler.warmup: 500
156
+
157
+ NoiseFloor.db: !!python/tuple
158
+ - const
159
+ - -50.0
160
+ NoiseFloor.name: null
161
+ NoiseFloor.prob: 1.0
162
+
163
+ Quantization.channels: !!python/tuple
164
+ - choice
165
+ - - 8
166
+ - 32
167
+ - 128
168
+ - 256
169
+ - 1024
170
+ Quantization.name: null
171
+ Quantization.prob: 1.0
172
+
173
+ Repeat.n_repeat: 1
174
+ Repeat.name: null
175
+ Repeat.prob: 1.0
176
+
177
+ RepeatUpTo.max_repeat: 5
178
+ RepeatUpTo.name: null
179
+ RepeatUpTo.prob: 1.0
180
+ RepeatUpTo.weights: null
181
+
182
+ RescaleAudio.name: null
183
+ RescaleAudio.prob: 1
184
+ RescaleAudio.val: 1.0
185
+
186
+ RoomImpulseResponse.drr: !!python/tuple
187
+ - uniform
188
+ - 0.0
189
+ - 30.0
190
+ RoomImpulseResponse.duration: 1.0
191
+ RoomImpulseResponse.eq_amount: !!python/tuple
192
+ - const
193
+ - 1.0
194
+ RoomImpulseResponse.n_bands: 6
195
+ RoomImpulseResponse.name: null
196
+ RoomImpulseResponse.offset: 0.0
197
+ RoomImpulseResponse.prob: 1.0
198
+ RoomImpulseResponse.sources: null
199
+ RoomImpulseResponse.use_original_phase: false
200
+ RoomImpulseResponse.weights: null
201
+
202
+ ShiftPhase.name: null
203
+ ShiftPhase.prob: 1
204
+ ShiftPhase.shift: !!python/tuple
205
+ - uniform
206
+ - -3.141592653589793
207
+ - 3.141592653589793
208
+
209
+ Silence.name: null
210
+ Silence.prob: 0.1
211
+
212
+ Smoothing.name: null
213
+ Smoothing.prob: 1
214
+ Smoothing.window_length: !!python/tuple
215
+ - choice
216
+ - - 8
217
+ - 16
218
+ - 32
219
+ - 64
220
+ - 128
221
+ - 256
222
+ - 512
223
+ Smoothing.window_type: !!python/tuple
224
+ - const
225
+ - average
226
+
227
+ SpectralDenoising.denoise_amount: !!python/tuple
228
+ - uniform
229
+ - 0.8
230
+ - 1.0
231
+ SpectralDenoising.eq_amount: !!python/tuple
232
+ - const
233
+ - 1.0
234
+ SpectralDenoising.n_bands: 6
235
+ SpectralDenoising.n_freq: 3
236
+ SpectralDenoising.n_time: 5
237
+ SpectralDenoising.name: null
238
+ SpectralDenoising.nz_volume: -40
239
+ SpectralDenoising.prob: 1
240
+
241
+ TimeMask.name: null
242
+ TimeMask.prob: 1
243
+ TimeMask.t_center: !!python/tuple
244
+ - uniform
245
+ - 0.0
246
+ - 1.0
247
+ TimeMask.t_width: !!python/tuple
248
+ - const
249
+ - 0.025
250
+
251
+ TimeNoise.name: null
252
+ TimeNoise.prob: 1
253
+ TimeNoise.t_center: !!python/tuple
254
+ - uniform
255
+ - 0.0
256
+ - 1.0
257
+ TimeNoise.t_width: !!python/tuple
258
+ - const
259
+ - 0.025
260
+
261
+ VampNet.dropout: 0.1
262
+ VampNet.embedding_dim: 1280
263
+ VampNet.flash_attn: false
264
+ VampNet.latent_dim: 8
265
+ VampNet.n_codebooks: 14
266
+ VampNet.n_conditioning_codebooks: 4
267
+ VampNet.n_heads: 20
268
+ VampNet.n_layers: 16
269
+ VampNet.noise_mode: mask
270
+ VampNet.r_cond_dim: 0
271
+ VampNet.vocab_size: 1024
272
+
273
+ VolumeChange.db: !!python/tuple
274
+ - uniform
275
+ - -12.0
276
+ - 0.0
277
+ VolumeChange.name: null
278
+ VolumeChange.prob: 1.0
279
+
280
+ VolumeNorm.db: !!python/tuple
281
+ - const
282
+ - -24
283
+ VolumeNorm.name: null
284
+ VolumeNorm.prob: 1.0
285
+
286
+ amp: false
287
+
288
+ args.debug: true
289
+ args.load: conf/generated/boleros/c2f.yml
290
+ args.save: null
291
+
292
+ batch_size: 7
293
+
294
+ codec_ckpt: ./models/vampnet/codec.pth
295
+
296
+ fine_tune: true
297
+
298
+ fine_tune_checkpoint: ./models/vampnet/c2f.pth
299
+
300
+ grad_clip_val: 5.0
301
+
302
+ num_iters: 500000
303
+
304
+ num_workers: 7
305
+
306
+ resume: false
307
+
308
+ sample_freq: 1000
309
+
310
+ save_iters:
311
+ - 10000
312
+ - 20000
313
+ - 30000
314
+ - 40000
315
+ - 50000
316
+ - 100000
317
+
318
+ save_path: ./runs/boleros/c2f
319
+
320
+ seed: 0
321
+
322
+ tag: latest
323
+
324
+ train/AudioDataset.aligned: false
325
+ train/AudioDataset.duration: 3.0
326
+ train/AudioDataset.loudness_cutoff: -40.0
327
+ train/AudioDataset.n_examples: 100000000
328
+ train/AudioDataset.num_channels: 1
329
+ train/AudioDataset.offset: null
330
+ train/AudioDataset.shuffle_loaders: false
331
+ train/AudioDataset.without_replacement: false
332
+
333
+ train/AudioLoader.sources:
334
+ - /media/CHONK/hugo/loras/boleros
335
+
336
+ train/BackgroundNoise.eq_amount: !!python/tuple
337
+ - const
338
+ - 1.0
339
+ train/BackgroundNoise.loudness_cutoff: null
340
+ train/BackgroundNoise.n_bands: 3
341
+ train/BackgroundNoise.name: null
342
+ train/BackgroundNoise.prob: 1.0
343
+ train/BackgroundNoise.snr: !!python/tuple
344
+ - uniform
345
+ - 10.0
346
+ - 30.0
347
+ train/BackgroundNoise.sources: null
348
+ train/BackgroundNoise.weights: null
349
+
350
+ train/BaseTransform.keys: []
351
+ train/BaseTransform.name: null
352
+ train/BaseTransform.prob: 1.0
353
+
354
+ train/ClippingDistortion.name: null
355
+ train/ClippingDistortion.perc: !!python/tuple
356
+ - uniform
357
+ - 0.0
358
+ - 0.1
359
+ train/ClippingDistortion.prob: 1.0
360
+
361
+ train/CorruptPhase.name: null
362
+ train/CorruptPhase.prob: 1
363
+ train/CorruptPhase.scale: !!python/tuple
364
+ - uniform
365
+ - 0
366
+ - 3.141592653589793
367
+
368
+ train/CrossTalk.loudness_cutoff: -40
369
+ train/CrossTalk.name: null
370
+ train/CrossTalk.prob: 1.0
371
+ train/CrossTalk.snr: !!python/tuple
372
+ - uniform
373
+ - 0.0
374
+ - 10.0
375
+ train/CrossTalk.sources: null
376
+ train/CrossTalk.weights: null
377
+
378
+ train/Equalizer.eq_amount: !!python/tuple
379
+ - const
380
+ - 1.0
381
+ train/Equalizer.n_bands: 6
382
+ train/Equalizer.name: null
383
+ train/Equalizer.prob: 1.0
384
+
385
+ train/FrequencyMask.f_center: !!python/tuple
386
+ - uniform
387
+ - 0.0
388
+ - 1.0
389
+ train/FrequencyMask.f_width: !!python/tuple
390
+ - const
391
+ - 0.1
392
+ train/FrequencyMask.name: null
393
+ train/FrequencyMask.prob: 1
394
+
395
+ train/FrequencyNoise.f_center: !!python/tuple
396
+ - uniform
397
+ - 0.0
398
+ - 1.0
399
+ train/FrequencyNoise.f_width: !!python/tuple
400
+ - const
401
+ - 0.1
402
+ train/FrequencyNoise.name: null
403
+ train/FrequencyNoise.prob: 1
404
+
405
+ train/GlobalVolumeNorm.db: !!python/tuple
406
+ - const
407
+ - -24
408
+ train/GlobalVolumeNorm.name: null
409
+ train/GlobalVolumeNorm.prob: 1.0
410
+
411
+ train/HighPass.cutoff: !!python/tuple
412
+ - choice
413
+ - - 50
414
+ - 100
415
+ - 250
416
+ - 500
417
+ - 1000
418
+ train/HighPass.name: null
419
+ train/HighPass.prob: 1
420
+ train/HighPass.zeros: 51
421
+
422
+ train/InvertPhase.name: null
423
+ train/InvertPhase.prob: 1
424
+
425
+ train/LowPass.cutoff: !!python/tuple
426
+ - choice
427
+ - - 4000
428
+ - 8000
429
+ - 16000
430
+ train/LowPass.name: null
431
+ train/LowPass.prob: 1
432
+ train/LowPass.zeros: 51
433
+
434
+ train/MaskLowMagnitudes.db_cutoff: !!python/tuple
435
+ - uniform
436
+ - -10
437
+ - 10
438
+ train/MaskLowMagnitudes.name: null
439
+ train/MaskLowMagnitudes.prob: 1
440
+
441
+ train/MuLawQuantization.channels: !!python/tuple
442
+ - choice
443
+ - - 8
444
+ - 32
445
+ - 128
446
+ - 256
447
+ - 1024
448
+ train/MuLawQuantization.name: null
449
+ train/MuLawQuantization.prob: 1.0
450
+
451
+ train/NoiseFloor.db: !!python/tuple
452
+ - const
453
+ - -50.0
454
+ train/NoiseFloor.name: null
455
+ train/NoiseFloor.prob: 1.0
456
+
457
+ train/Quantization.channels: !!python/tuple
458
+ - choice
459
+ - - 8
460
+ - 32
461
+ - 128
462
+ - 256
463
+ - 1024
464
+ train/Quantization.name: null
465
+ train/Quantization.prob: 1.0
466
+
467
+ train/Repeat.n_repeat: 1
468
+ train/Repeat.name: null
469
+ train/Repeat.prob: 1.0
470
+
471
+ train/RepeatUpTo.max_repeat: 5
472
+ train/RepeatUpTo.name: null
473
+ train/RepeatUpTo.prob: 1.0
474
+ train/RepeatUpTo.weights: null
475
+
476
+ train/RescaleAudio.name: null
477
+ train/RescaleAudio.prob: 1
478
+ train/RescaleAudio.val: 1.0
479
+
480
+ train/RoomImpulseResponse.drr: !!python/tuple
481
+ - uniform
482
+ - 0.0
483
+ - 30.0
484
+ train/RoomImpulseResponse.duration: 1.0
485
+ train/RoomImpulseResponse.eq_amount: !!python/tuple
486
+ - const
487
+ - 1.0
488
+ train/RoomImpulseResponse.n_bands: 6
489
+ train/RoomImpulseResponse.name: null
490
+ train/RoomImpulseResponse.offset: 0.0
491
+ train/RoomImpulseResponse.prob: 1.0
492
+ train/RoomImpulseResponse.sources: null
493
+ train/RoomImpulseResponse.use_original_phase: false
494
+ train/RoomImpulseResponse.weights: null
495
+
496
+ train/ShiftPhase.name: null
497
+ train/ShiftPhase.prob: 1
498
+ train/ShiftPhase.shift: !!python/tuple
499
+ - uniform
500
+ - -3.141592653589793
501
+ - 3.141592653589793
502
+
503
+ train/Silence.name: null
504
+ train/Silence.prob: 0.1
505
+
506
+ train/Smoothing.name: null
507
+ train/Smoothing.prob: 1
508
+ train/Smoothing.window_length: !!python/tuple
509
+ - choice
510
+ - - 8
511
+ - 16
512
+ - 32
513
+ - 64
514
+ - 128
515
+ - 256
516
+ - 512
517
+ train/Smoothing.window_type: !!python/tuple
518
+ - const
519
+ - average
520
+
521
+ train/SpectralDenoising.denoise_amount: !!python/tuple
522
+ - uniform
523
+ - 0.8
524
+ - 1.0
525
+ train/SpectralDenoising.eq_amount: !!python/tuple
526
+ - const
527
+ - 1.0
528
+ train/SpectralDenoising.n_bands: 6
529
+ train/SpectralDenoising.n_freq: 3
530
+ train/SpectralDenoising.n_time: 5
531
+ train/SpectralDenoising.name: null
532
+ train/SpectralDenoising.nz_volume: -40
533
+ train/SpectralDenoising.prob: 1
534
+
535
+ train/TimeMask.name: null
536
+ train/TimeMask.prob: 1
537
+ train/TimeMask.t_center: !!python/tuple
538
+ - uniform
539
+ - 0.0
540
+ - 1.0
541
+ train/TimeMask.t_width: !!python/tuple
542
+ - const
543
+ - 0.025
544
+
545
+ train/TimeNoise.name: null
546
+ train/TimeNoise.prob: 1
547
+ train/TimeNoise.t_center: !!python/tuple
548
+ - uniform
549
+ - 0.0
550
+ - 1.0
551
+ train/TimeNoise.t_width: !!python/tuple
552
+ - const
553
+ - 0.025
554
+
555
+ train/VolumeChange.db: !!python/tuple
556
+ - uniform
557
+ - -12.0
558
+ - 0.0
559
+ train/VolumeChange.name: null
560
+ train/VolumeChange.prob: 1.0
561
+
562
+ train/VolumeNorm.db: !!python/tuple
563
+ - const
564
+ - -24
565
+ train/VolumeNorm.name: null
566
+ train/VolumeNorm.prob: 1.0
567
+
568
+ val/AudioDataset.aligned: false
569
+ val/AudioDataset.duration: 3.0
570
+ val/AudioDataset.loudness_cutoff: -40.0
571
+ val/AudioDataset.n_examples: 500
572
+ val/AudioDataset.num_channels: 1
573
+ val/AudioDataset.offset: null
574
+ val/AudioDataset.shuffle_loaders: false
575
+ val/AudioDataset.without_replacement: false
576
+
577
+ val/AudioLoader.sources:
578
+ - /media/CHONK/hugo/loras/boleros
579
+
580
+ val/BackgroundNoise.eq_amount: !!python/tuple
581
+ - const
582
+ - 1.0
583
+ val/BackgroundNoise.loudness_cutoff: null
584
+ val/BackgroundNoise.n_bands: 3
585
+ val/BackgroundNoise.name: null
586
+ val/BackgroundNoise.prob: 1.0
587
+ val/BackgroundNoise.snr: !!python/tuple
588
+ - uniform
589
+ - 10.0
590
+ - 30.0
591
+ val/BackgroundNoise.sources: null
592
+ val/BackgroundNoise.weights: null
593
+
594
+ val/BaseTransform.keys: []
595
+ val/BaseTransform.name: null
596
+ val/BaseTransform.prob: 1.0
597
+
598
+ val/ClippingDistortion.name: null
599
+ val/ClippingDistortion.perc: !!python/tuple
600
+ - uniform
601
+ - 0.0
602
+ - 0.1
603
+ val/ClippingDistortion.prob: 1.0
604
+
605
+ val/CorruptPhase.name: null
606
+ val/CorruptPhase.prob: 1
607
+ val/CorruptPhase.scale: !!python/tuple
608
+ - uniform
609
+ - 0
610
+ - 3.141592653589793
611
+
612
+ val/CrossTalk.loudness_cutoff: -40
613
+ val/CrossTalk.name: null
614
+ val/CrossTalk.prob: 1.0
615
+ val/CrossTalk.snr: !!python/tuple
616
+ - uniform
617
+ - 0.0
618
+ - 10.0
619
+ val/CrossTalk.sources: null
620
+ val/CrossTalk.weights: null
621
+
622
+ val/Equalizer.eq_amount: !!python/tuple
623
+ - const
624
+ - 1.0
625
+ val/Equalizer.n_bands: 6
626
+ val/Equalizer.name: null
627
+ val/Equalizer.prob: 1.0
628
+
629
+ val/FrequencyMask.f_center: !!python/tuple
630
+ - uniform
631
+ - 0.0
632
+ - 1.0
633
+ val/FrequencyMask.f_width: !!python/tuple
634
+ - const
635
+ - 0.1
636
+ val/FrequencyMask.name: null
637
+ val/FrequencyMask.prob: 1
638
+
639
+ val/FrequencyNoise.f_center: !!python/tuple
640
+ - uniform
641
+ - 0.0
642
+ - 1.0
643
+ val/FrequencyNoise.f_width: !!python/tuple
644
+ - const
645
+ - 0.1
646
+ val/FrequencyNoise.name: null
647
+ val/FrequencyNoise.prob: 1
648
+
649
+ val/GlobalVolumeNorm.db: !!python/tuple
650
+ - const
651
+ - -24
652
+ val/GlobalVolumeNorm.name: null
653
+ val/GlobalVolumeNorm.prob: 1.0
654
+
655
+ val/HighPass.cutoff: !!python/tuple
656
+ - choice
657
+ - - 50
658
+ - 100
659
+ - 250
660
+ - 500
661
+ - 1000
662
+ val/HighPass.name: null
663
+ val/HighPass.prob: 1
664
+ val/HighPass.zeros: 51
665
+
666
+ val/InvertPhase.name: null
667
+ val/InvertPhase.prob: 1
668
+
669
+ val/LowPass.cutoff: !!python/tuple
670
+ - choice
671
+ - - 4000
672
+ - 8000
673
+ - 16000
674
+ val/LowPass.name: null
675
+ val/LowPass.prob: 1
676
+ val/LowPass.zeros: 51
677
+
678
+ val/MaskLowMagnitudes.db_cutoff: !!python/tuple
679
+ - uniform
680
+ - -10
681
+ - 10
682
+ val/MaskLowMagnitudes.name: null
683
+ val/MaskLowMagnitudes.prob: 1
684
+
685
+ val/MuLawQuantization.channels: !!python/tuple
686
+ - choice
687
+ - - 8
688
+ - 32
689
+ - 128
690
+ - 256
691
+ - 1024
692
+ val/MuLawQuantization.name: null
693
+ val/MuLawQuantization.prob: 1.0
694
+
695
+ val/NoiseFloor.db: !!python/tuple
696
+ - const
697
+ - -50.0
698
+ val/NoiseFloor.name: null
699
+ val/NoiseFloor.prob: 1.0
700
+
701
+ val/Quantization.channels: !!python/tuple
702
+ - choice
703
+ - - 8
704
+ - 32
705
+ - 128
706
+ - 256
707
+ - 1024
708
+ val/Quantization.name: null
709
+ val/Quantization.prob: 1.0
710
+
711
+ val/Repeat.n_repeat: 1
712
+ val/Repeat.name: null
713
+ val/Repeat.prob: 1.0
714
+
715
+ val/RepeatUpTo.max_repeat: 5
716
+ val/RepeatUpTo.name: null
717
+ val/RepeatUpTo.prob: 1.0
718
+ val/RepeatUpTo.weights: null
719
+
720
+ val/RescaleAudio.name: null
721
+ val/RescaleAudio.prob: 1
722
+ val/RescaleAudio.val: 1.0
723
+
724
+ val/RoomImpulseResponse.drr: !!python/tuple
725
+ - uniform
726
+ - 0.0
727
+ - 30.0
728
+ val/RoomImpulseResponse.duration: 1.0
729
+ val/RoomImpulseResponse.eq_amount: !!python/tuple
730
+ - const
731
+ - 1.0
732
+ val/RoomImpulseResponse.n_bands: 6
733
+ val/RoomImpulseResponse.name: null
734
+ val/RoomImpulseResponse.offset: 0.0
735
+ val/RoomImpulseResponse.prob: 1.0
736
+ val/RoomImpulseResponse.sources: null
737
+ val/RoomImpulseResponse.use_original_phase: false
738
+ val/RoomImpulseResponse.weights: null
739
+
740
+ val/ShiftPhase.name: null
741
+ val/ShiftPhase.prob: 1
742
+ val/ShiftPhase.shift: !!python/tuple
743
+ - uniform
744
+ - -3.141592653589793
745
+ - 3.141592653589793
746
+
747
+ val/Silence.name: null
748
+ val/Silence.prob: 0.1
749
+
750
+ val/Smoothing.name: null
751
+ val/Smoothing.prob: 1
752
+ val/Smoothing.window_length: !!python/tuple
753
+ - choice
754
+ - - 8
755
+ - 16
756
+ - 32
757
+ - 64
758
+ - 128
759
+ - 256
760
+ - 512
761
+ val/Smoothing.window_type: !!python/tuple
762
+ - const
763
+ - average
764
+
765
+ val/SpectralDenoising.denoise_amount: !!python/tuple
766
+ - uniform
767
+ - 0.8
768
+ - 1.0
769
+ val/SpectralDenoising.eq_amount: !!python/tuple
770
+ - const
771
+ - 1.0
772
+ val/SpectralDenoising.n_bands: 6
773
+ val/SpectralDenoising.n_freq: 3
774
+ val/SpectralDenoising.n_time: 5
775
+ val/SpectralDenoising.name: null
776
+ val/SpectralDenoising.nz_volume: -40
777
+ val/SpectralDenoising.prob: 1
778
+
779
+ val/TimeMask.name: null
780
+ val/TimeMask.prob: 1
781
+ val/TimeMask.t_center: !!python/tuple
782
+ - uniform
783
+ - 0.0
784
+ - 1.0
785
+ val/TimeMask.t_width: !!python/tuple
786
+ - const
787
+ - 0.025
788
+
789
+ val/TimeNoise.name: null
790
+ val/TimeNoise.prob: 1
791
+ val/TimeNoise.t_center: !!python/tuple
792
+ - uniform
793
+ - 0.0
794
+ - 1.0
795
+ val/TimeNoise.t_width: !!python/tuple
796
+ - const
797
+ - 0.025
798
+
799
+ val/VolumeChange.db: !!python/tuple
800
+ - uniform
801
+ - -12.0
802
+ - 0.0
803
+ val/VolumeChange.name: null
804
+ val/VolumeChange.prob: 1.0
805
+
806
+ val/VolumeNorm.db: !!python/tuple
807
+ - const
808
+ - -24
809
+ val/VolumeNorm.name: null
810
+ val/VolumeNorm.prob: 1.0
811
+
812
+ val_freq: 500
813
+
814
+ val_idx:
815
+ - 0
816
+ - 1
817
+ - 2
818
+ - 3
819
+ - 4
820
+ - 5
821
+ - 6
822
+ - 7
823
+ - 8
824
+ - 9
825
+
runs/boleros/c2f/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8408ab94ce858360744e6c7f8fe708e48926fd26f5021c8d13506d529e12ac68
3
+ size 1111127537
runs/boleros/c2f/model.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ OptimizedModule(
2
+ 277.753M params.
3
+ (_orig_mod): VampNet(
4
+ 277.753M params.
5
+ (embedding): CodebookEmbedding(
6
+ 0.145M params.
7
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
8
+ (out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
9
+ )
10
+ (transformer): TransformerStack(
11
+ 264.481M params.
12
+ (layers): ModuleList(
13
+ (0): TransformerLayer(
14
+ 16.531M params.
15
+ (norm_1): RMSNorm( 0.001M params.)
16
+ (film_1): FiLM( 0.000M params.)
17
+ (self_attn): MultiHeadRelativeAttention(
18
+ 6.616M params.
19
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
21
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
22
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
23
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
24
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
25
+ )
26
+ (norm_3): RMSNorm( 0.001M params.)
27
+ (film_3): FiLM( 0.000M params.)
28
+ (feed_forward): FeedForward(
29
+ 9.912M params.
30
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
31
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
32
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
33
+ (act): GatedGELU(
34
+ 0.000M params.
35
+ (gelu): NewGELU( 0.000M params.)
36
+ )
37
+ )
38
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
39
+ )
40
+ (1-15): 15 x TransformerLayer(
41
+ 16.530M params.
42
+ (norm_1): RMSNorm( 0.001M params.)
43
+ (film_1): FiLM( 0.000M params.)
44
+ (self_attn): MultiHeadRelativeAttention(
45
+ 6.615M params.
46
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
48
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
49
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
50
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
51
+ )
52
+ (norm_3): RMSNorm( 0.001M params.)
53
+ (film_3): FiLM( 0.000M params.)
54
+ (feed_forward): FeedForward(
55
+ 9.912M params.
56
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
57
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
58
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
59
+ (act): GatedGELU(
60
+ 0.000M params.
61
+ (gelu): NewGELU( 0.000M params.)
62
+ )
63
+ )
64
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
65
+ )
66
+ )
67
+ (norm): RMSNorm( 0.001M params.)
68
+ )
69
+ (classifier): SequentialWithFiLM(
70
+ 13.128M params.
71
+ (layers): ModuleList(
72
+ (0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
73
+ )
74
+ )
75
+ )
76
+ )
runs/boleros/coarse/args.yml ADDED
@@ -0,0 +1,825 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 10.0
14
+ AudioDataset.loudness_cutoff: -30.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: ''
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ BackgroundNoise.eq_amount: !!python/tuple
33
+ - const
34
+ - 1.0
35
+ BackgroundNoise.loudness_cutoff: null
36
+ BackgroundNoise.n_bands: 3
37
+ BackgroundNoise.name: null
38
+ BackgroundNoise.prob: 1.0
39
+ BackgroundNoise.snr: !!python/tuple
40
+ - uniform
41
+ - 10.0
42
+ - 30.0
43
+ BackgroundNoise.sources: null
44
+ BackgroundNoise.weights: null
45
+
46
+ BaseTransform.keys: []
47
+ BaseTransform.name: null
48
+ BaseTransform.prob: 1.0
49
+
50
+ ClippingDistortion.name: null
51
+ ClippingDistortion.perc: !!python/tuple
52
+ - uniform
53
+ - 0.0
54
+ - 0.1
55
+ ClippingDistortion.prob: 1.0
56
+
57
+ CorruptPhase.name: null
58
+ CorruptPhase.prob: 1
59
+ CorruptPhase.scale: !!python/tuple
60
+ - uniform
61
+ - 0
62
+ - 3.141592653589793
63
+
64
+ CrossEntropyLoss.ignore_index: -100
65
+ CrossEntropyLoss.label_smoothing: 0.1
66
+ CrossEntropyLoss.reduce: null
67
+ CrossEntropyLoss.reduction: mean
68
+ CrossEntropyLoss.size_average: null
69
+
70
+ CrossTalk.loudness_cutoff: -40
71
+ CrossTalk.name: null
72
+ CrossTalk.prob: 1.0
73
+ CrossTalk.snr: !!python/tuple
74
+ - uniform
75
+ - 0.0
76
+ - 10.0
77
+ CrossTalk.sources: null
78
+ CrossTalk.weights: null
79
+
80
+ Equalizer.eq_amount: !!python/tuple
81
+ - const
82
+ - 1.0
83
+ Equalizer.n_bands: 6
84
+ Equalizer.name: null
85
+ Equalizer.prob: 1.0
86
+
87
+ FrequencyMask.f_center: !!python/tuple
88
+ - uniform
89
+ - 0.0
90
+ - 1.0
91
+ FrequencyMask.f_width: !!python/tuple
92
+ - const
93
+ - 0.1
94
+ FrequencyMask.name: null
95
+ FrequencyMask.prob: 1
96
+
97
+ FrequencyNoise.f_center: !!python/tuple
98
+ - uniform
99
+ - 0.0
100
+ - 1.0
101
+ FrequencyNoise.f_width: !!python/tuple
102
+ - const
103
+ - 0.1
104
+ FrequencyNoise.name: null
105
+ FrequencyNoise.prob: 1
106
+
107
+ GlobalVolumeNorm.db: !!python/tuple
108
+ - const
109
+ - -24
110
+ GlobalVolumeNorm.name: null
111
+ GlobalVolumeNorm.prob: 1.0
112
+
113
+ HighPass.cutoff: !!python/tuple
114
+ - choice
115
+ - - 50
116
+ - 100
117
+ - 250
118
+ - 500
119
+ - 1000
120
+ HighPass.name: null
121
+ HighPass.prob: 1
122
+ HighPass.zeros: 51
123
+
124
+ InvertPhase.name: null
125
+ InvertPhase.prob: 1
126
+
127
+ LowPass.cutoff: !!python/tuple
128
+ - choice
129
+ - - 4000
130
+ - 8000
131
+ - 16000
132
+ LowPass.name: null
133
+ LowPass.prob: 1
134
+ LowPass.zeros: 51
135
+
136
+ MaskLowMagnitudes.db_cutoff: !!python/tuple
137
+ - uniform
138
+ - -10
139
+ - 10
140
+ MaskLowMagnitudes.name: null
141
+ MaskLowMagnitudes.prob: 1
142
+
143
+ MuLawQuantization.channels: !!python/tuple
144
+ - choice
145
+ - - 8
146
+ - 32
147
+ - 128
148
+ - 256
149
+ - 1024
150
+ MuLawQuantization.name: null
151
+ MuLawQuantization.prob: 1.0
152
+
153
+ NoamScheduler.d_model: 512
154
+ NoamScheduler.factor: 2.0
155
+ NoamScheduler.warmup: 500
156
+
157
+ NoiseFloor.db: !!python/tuple
158
+ - const
159
+ - -50.0
160
+ NoiseFloor.name: null
161
+ NoiseFloor.prob: 1.0
162
+
163
+ Quantization.channels: !!python/tuple
164
+ - choice
165
+ - - 8
166
+ - 32
167
+ - 128
168
+ - 256
169
+ - 1024
170
+ Quantization.name: null
171
+ Quantization.prob: 1.0
172
+
173
+ Repeat.n_repeat: 1
174
+ Repeat.name: null
175
+ Repeat.prob: 1.0
176
+
177
+ RepeatUpTo.max_repeat: 5
178
+ RepeatUpTo.name: null
179
+ RepeatUpTo.prob: 1.0
180
+ RepeatUpTo.weights: null
181
+
182
+ RescaleAudio.name: null
183
+ RescaleAudio.prob: 1
184
+ RescaleAudio.val: 1.0
185
+
186
+ RoomImpulseResponse.drr: !!python/tuple
187
+ - uniform
188
+ - 0.0
189
+ - 30.0
190
+ RoomImpulseResponse.duration: 1.0
191
+ RoomImpulseResponse.eq_amount: !!python/tuple
192
+ - const
193
+ - 1.0
194
+ RoomImpulseResponse.n_bands: 6
195
+ RoomImpulseResponse.name: null
196
+ RoomImpulseResponse.offset: 0.0
197
+ RoomImpulseResponse.prob: 1.0
198
+ RoomImpulseResponse.sources: null
199
+ RoomImpulseResponse.use_original_phase: false
200
+ RoomImpulseResponse.weights: null
201
+
202
+ ShiftPhase.name: null
203
+ ShiftPhase.prob: 1
204
+ ShiftPhase.shift: !!python/tuple
205
+ - uniform
206
+ - -3.141592653589793
207
+ - 3.141592653589793
208
+
209
+ Silence.name: null
210
+ Silence.prob: 0.1
211
+
212
+ Smoothing.name: null
213
+ Smoothing.prob: 1
214
+ Smoothing.window_length: !!python/tuple
215
+ - choice
216
+ - - 8
217
+ - 16
218
+ - 32
219
+ - 64
220
+ - 128
221
+ - 256
222
+ - 512
223
+ Smoothing.window_type: !!python/tuple
224
+ - const
225
+ - average
226
+
227
+ SpectralDenoising.denoise_amount: !!python/tuple
228
+ - uniform
229
+ - 0.8
230
+ - 1.0
231
+ SpectralDenoising.eq_amount: !!python/tuple
232
+ - const
233
+ - 1.0
234
+ SpectralDenoising.n_bands: 6
235
+ SpectralDenoising.n_freq: 3
236
+ SpectralDenoising.n_time: 5
237
+ SpectralDenoising.name: null
238
+ SpectralDenoising.nz_volume: -40
239
+ SpectralDenoising.prob: 1
240
+
241
+ TimeMask.name: null
242
+ TimeMask.prob: 1
243
+ TimeMask.t_center: !!python/tuple
244
+ - uniform
245
+ - 0.0
246
+ - 1.0
247
+ TimeMask.t_width: !!python/tuple
248
+ - const
249
+ - 0.025
250
+
251
+ TimeNoise.name: null
252
+ TimeNoise.prob: 1
253
+ TimeNoise.t_center: !!python/tuple
254
+ - uniform
255
+ - 0.0
256
+ - 1.0
257
+ TimeNoise.t_width: !!python/tuple
258
+ - const
259
+ - 0.025
260
+
261
+ VampNet.dropout: 0.1
262
+ VampNet.embedding_dim: 1280
263
+ VampNet.flash_attn: false
264
+ VampNet.latent_dim: 8
265
+ VampNet.n_codebooks: 4
266
+ VampNet.n_conditioning_codebooks: 0
267
+ VampNet.n_heads: 20
268
+ VampNet.n_layers: 20
269
+ VampNet.noise_mode: mask
270
+ VampNet.r_cond_dim: 0
271
+ VampNet.vocab_size: 1024
272
+
273
+ VolumeChange.db: !!python/tuple
274
+ - uniform
275
+ - -12.0
276
+ - 0.0
277
+ VolumeChange.name: null
278
+ VolumeChange.prob: 1.0
279
+
280
+ VolumeNorm.db: !!python/tuple
281
+ - const
282
+ - -24
283
+ VolumeNorm.name: null
284
+ VolumeNorm.prob: 1.0
285
+
286
+ amp: false
287
+
288
+ args.debug: true
289
+ args.load: conf/generated/boleros/coarse.yml
290
+ args.save: null
291
+
292
+ batch_size: 6
293
+
294
+ codec_ckpt: ./models/vampnet/codec.pth
295
+
296
+ fine_tune: true
297
+
298
+ fine_tune_checkpoint: ./models/vampnet/coarse.pth
299
+
300
+ grad_clip_val: 5.0
301
+
302
+ num_iters: 500000
303
+
304
+ num_workers: 7
305
+
306
+ resume: false
307
+
308
+ sample_freq: 1000
309
+
310
+ save_iters:
311
+ - 10000
312
+ - 20000
313
+ - 30000
314
+ - 40000
315
+ - 50000
316
+ - 100000
317
+
318
+ save_path: ./runs/boleros/coarse
319
+
320
+ seed: 0
321
+
322
+ tag: latest
323
+
324
+ train/AudioDataset.aligned: false
325
+ train/AudioDataset.duration: 10.0
326
+ train/AudioDataset.loudness_cutoff: -30.0
327
+ train/AudioDataset.n_examples: 100000000
328
+ train/AudioDataset.num_channels: 1
329
+ train/AudioDataset.offset: null
330
+ train/AudioDataset.shuffle_loaders: false
331
+ train/AudioDataset.without_replacement: false
332
+
333
+ train/AudioLoader.sources:
334
+ - /media/CHONK/hugo/loras/boleros
335
+
336
+ train/BackgroundNoise.eq_amount: !!python/tuple
337
+ - const
338
+ - 1.0
339
+ train/BackgroundNoise.loudness_cutoff: null
340
+ train/BackgroundNoise.n_bands: 3
341
+ train/BackgroundNoise.name: null
342
+ train/BackgroundNoise.prob: 1.0
343
+ train/BackgroundNoise.snr: !!python/tuple
344
+ - uniform
345
+ - 10.0
346
+ - 30.0
347
+ train/BackgroundNoise.sources: null
348
+ train/BackgroundNoise.weights: null
349
+
350
+ train/BaseTransform.keys: []
351
+ train/BaseTransform.name: null
352
+ train/BaseTransform.prob: 1.0
353
+
354
+ train/ClippingDistortion.name: null
355
+ train/ClippingDistortion.perc: !!python/tuple
356
+ - uniform
357
+ - 0.0
358
+ - 0.1
359
+ train/ClippingDistortion.prob: 1.0
360
+
361
+ train/CorruptPhase.name: null
362
+ train/CorruptPhase.prob: 1
363
+ train/CorruptPhase.scale: !!python/tuple
364
+ - uniform
365
+ - 0
366
+ - 3.141592653589793
367
+
368
+ train/CrossTalk.loudness_cutoff: -40
369
+ train/CrossTalk.name: null
370
+ train/CrossTalk.prob: 1.0
371
+ train/CrossTalk.snr: !!python/tuple
372
+ - uniform
373
+ - 0.0
374
+ - 10.0
375
+ train/CrossTalk.sources: null
376
+ train/CrossTalk.weights: null
377
+
378
+ train/Equalizer.eq_amount: !!python/tuple
379
+ - const
380
+ - 1.0
381
+ train/Equalizer.n_bands: 6
382
+ train/Equalizer.name: null
383
+ train/Equalizer.prob: 1.0
384
+
385
+ train/FrequencyMask.f_center: !!python/tuple
386
+ - uniform
387
+ - 0.0
388
+ - 1.0
389
+ train/FrequencyMask.f_width: !!python/tuple
390
+ - const
391
+ - 0.1
392
+ train/FrequencyMask.name: null
393
+ train/FrequencyMask.prob: 1
394
+
395
+ train/FrequencyNoise.f_center: !!python/tuple
396
+ - uniform
397
+ - 0.0
398
+ - 1.0
399
+ train/FrequencyNoise.f_width: !!python/tuple
400
+ - const
401
+ - 0.1
402
+ train/FrequencyNoise.name: null
403
+ train/FrequencyNoise.prob: 1
404
+
405
+ train/GlobalVolumeNorm.db: !!python/tuple
406
+ - const
407
+ - -24
408
+ train/GlobalVolumeNorm.name: null
409
+ train/GlobalVolumeNorm.prob: 1.0
410
+
411
+ train/HighPass.cutoff: !!python/tuple
412
+ - choice
413
+ - - 50
414
+ - 100
415
+ - 250
416
+ - 500
417
+ - 1000
418
+ train/HighPass.name: null
419
+ train/HighPass.prob: 1
420
+ train/HighPass.zeros: 51
421
+
422
+ train/InvertPhase.name: null
423
+ train/InvertPhase.prob: 1
424
+
425
+ train/LowPass.cutoff: !!python/tuple
426
+ - choice
427
+ - - 4000
428
+ - 8000
429
+ - 16000
430
+ train/LowPass.name: null
431
+ train/LowPass.prob: 1
432
+ train/LowPass.zeros: 51
433
+
434
+ train/MaskLowMagnitudes.db_cutoff: !!python/tuple
435
+ - uniform
436
+ - -10
437
+ - 10
438
+ train/MaskLowMagnitudes.name: null
439
+ train/MaskLowMagnitudes.prob: 1
440
+
441
+ train/MuLawQuantization.channels: !!python/tuple
442
+ - choice
443
+ - - 8
444
+ - 32
445
+ - 128
446
+ - 256
447
+ - 1024
448
+ train/MuLawQuantization.name: null
449
+ train/MuLawQuantization.prob: 1.0
450
+
451
+ train/NoiseFloor.db: !!python/tuple
452
+ - const
453
+ - -50.0
454
+ train/NoiseFloor.name: null
455
+ train/NoiseFloor.prob: 1.0
456
+
457
+ train/Quantization.channels: !!python/tuple
458
+ - choice
459
+ - - 8
460
+ - 32
461
+ - 128
462
+ - 256
463
+ - 1024
464
+ train/Quantization.name: null
465
+ train/Quantization.prob: 1.0
466
+
467
+ train/Repeat.n_repeat: 1
468
+ train/Repeat.name: null
469
+ train/Repeat.prob: 1.0
470
+
471
+ train/RepeatUpTo.max_repeat: 5
472
+ train/RepeatUpTo.name: null
473
+ train/RepeatUpTo.prob: 1.0
474
+ train/RepeatUpTo.weights: null
475
+
476
+ train/RescaleAudio.name: null
477
+ train/RescaleAudio.prob: 1
478
+ train/RescaleAudio.val: 1.0
479
+
480
+ train/RoomImpulseResponse.drr: !!python/tuple
481
+ - uniform
482
+ - 0.0
483
+ - 30.0
484
+ train/RoomImpulseResponse.duration: 1.0
485
+ train/RoomImpulseResponse.eq_amount: !!python/tuple
486
+ - const
487
+ - 1.0
488
+ train/RoomImpulseResponse.n_bands: 6
489
+ train/RoomImpulseResponse.name: null
490
+ train/RoomImpulseResponse.offset: 0.0
491
+ train/RoomImpulseResponse.prob: 1.0
492
+ train/RoomImpulseResponse.sources: null
493
+ train/RoomImpulseResponse.use_original_phase: false
494
+ train/RoomImpulseResponse.weights: null
495
+
496
+ train/ShiftPhase.name: null
497
+ train/ShiftPhase.prob: 1
498
+ train/ShiftPhase.shift: !!python/tuple
499
+ - uniform
500
+ - -3.141592653589793
501
+ - 3.141592653589793
502
+
503
+ train/Silence.name: null
504
+ train/Silence.prob: 0.1
505
+
506
+ train/Smoothing.name: null
507
+ train/Smoothing.prob: 1
508
+ train/Smoothing.window_length: !!python/tuple
509
+ - choice
510
+ - - 8
511
+ - 16
512
+ - 32
513
+ - 64
514
+ - 128
515
+ - 256
516
+ - 512
517
+ train/Smoothing.window_type: !!python/tuple
518
+ - const
519
+ - average
520
+
521
+ train/SpectralDenoising.denoise_amount: !!python/tuple
522
+ - uniform
523
+ - 0.8
524
+ - 1.0
525
+ train/SpectralDenoising.eq_amount: !!python/tuple
526
+ - const
527
+ - 1.0
528
+ train/SpectralDenoising.n_bands: 6
529
+ train/SpectralDenoising.n_freq: 3
530
+ train/SpectralDenoising.n_time: 5
531
+ train/SpectralDenoising.name: null
532
+ train/SpectralDenoising.nz_volume: -40
533
+ train/SpectralDenoising.prob: 1
534
+
535
+ train/TimeMask.name: null
536
+ train/TimeMask.prob: 1
537
+ train/TimeMask.t_center: !!python/tuple
538
+ - uniform
539
+ - 0.0
540
+ - 1.0
541
+ train/TimeMask.t_width: !!python/tuple
542
+ - const
543
+ - 0.025
544
+
545
+ train/TimeNoise.name: null
546
+ train/TimeNoise.prob: 1
547
+ train/TimeNoise.t_center: !!python/tuple
548
+ - uniform
549
+ - 0.0
550
+ - 1.0
551
+ train/TimeNoise.t_width: !!python/tuple
552
+ - const
553
+ - 0.025
554
+
555
+ train/VolumeChange.db: !!python/tuple
556
+ - uniform
557
+ - -12.0
558
+ - 0.0
559
+ train/VolumeChange.name: null
560
+ train/VolumeChange.prob: 1.0
561
+
562
+ train/VolumeNorm.db: !!python/tuple
563
+ - const
564
+ - -24
565
+ train/VolumeNorm.name: null
566
+ train/VolumeNorm.prob: 1.0
567
+
568
+ val/AudioDataset.aligned: false
569
+ val/AudioDataset.duration: 10.0
570
+ val/AudioDataset.loudness_cutoff: -30.0
571
+ val/AudioDataset.n_examples: 500
572
+ val/AudioDataset.num_channels: 1
573
+ val/AudioDataset.offset: null
574
+ val/AudioDataset.shuffle_loaders: false
575
+ val/AudioDataset.without_replacement: false
576
+
577
+ val/AudioLoader.sources:
578
+ - /media/CHONK/hugo/loras/boleros
579
+
580
+ val/BackgroundNoise.eq_amount: !!python/tuple
581
+ - const
582
+ - 1.0
583
+ val/BackgroundNoise.loudness_cutoff: null
584
+ val/BackgroundNoise.n_bands: 3
585
+ val/BackgroundNoise.name: null
586
+ val/BackgroundNoise.prob: 1.0
587
+ val/BackgroundNoise.snr: !!python/tuple
588
+ - uniform
589
+ - 10.0
590
+ - 30.0
591
+ val/BackgroundNoise.sources: null
592
+ val/BackgroundNoise.weights: null
593
+
594
+ val/BaseTransform.keys: []
595
+ val/BaseTransform.name: null
596
+ val/BaseTransform.prob: 1.0
597
+
598
+ val/ClippingDistortion.name: null
599
+ val/ClippingDistortion.perc: !!python/tuple
600
+ - uniform
601
+ - 0.0
602
+ - 0.1
603
+ val/ClippingDistortion.prob: 1.0
604
+
605
+ val/CorruptPhase.name: null
606
+ val/CorruptPhase.prob: 1
607
+ val/CorruptPhase.scale: !!python/tuple
608
+ - uniform
609
+ - 0
610
+ - 3.141592653589793
611
+
612
+ val/CrossTalk.loudness_cutoff: -40
613
+ val/CrossTalk.name: null
614
+ val/CrossTalk.prob: 1.0
615
+ val/CrossTalk.snr: !!python/tuple
616
+ - uniform
617
+ - 0.0
618
+ - 10.0
619
+ val/CrossTalk.sources: null
620
+ val/CrossTalk.weights: null
621
+
622
+ val/Equalizer.eq_amount: !!python/tuple
623
+ - const
624
+ - 1.0
625
+ val/Equalizer.n_bands: 6
626
+ val/Equalizer.name: null
627
+ val/Equalizer.prob: 1.0
628
+
629
+ val/FrequencyMask.f_center: !!python/tuple
630
+ - uniform
631
+ - 0.0
632
+ - 1.0
633
+ val/FrequencyMask.f_width: !!python/tuple
634
+ - const
635
+ - 0.1
636
+ val/FrequencyMask.name: null
637
+ val/FrequencyMask.prob: 1
638
+
639
+ val/FrequencyNoise.f_center: !!python/tuple
640
+ - uniform
641
+ - 0.0
642
+ - 1.0
643
+ val/FrequencyNoise.f_width: !!python/tuple
644
+ - const
645
+ - 0.1
646
+ val/FrequencyNoise.name: null
647
+ val/FrequencyNoise.prob: 1
648
+
649
+ val/GlobalVolumeNorm.db: !!python/tuple
650
+ - const
651
+ - -24
652
+ val/GlobalVolumeNorm.name: null
653
+ val/GlobalVolumeNorm.prob: 1.0
654
+
655
+ val/HighPass.cutoff: !!python/tuple
656
+ - choice
657
+ - - 50
658
+ - 100
659
+ - 250
660
+ - 500
661
+ - 1000
662
+ val/HighPass.name: null
663
+ val/HighPass.prob: 1
664
+ val/HighPass.zeros: 51
665
+
666
+ val/InvertPhase.name: null
667
+ val/InvertPhase.prob: 1
668
+
669
+ val/LowPass.cutoff: !!python/tuple
670
+ - choice
671
+ - - 4000
672
+ - 8000
673
+ - 16000
674
+ val/LowPass.name: null
675
+ val/LowPass.prob: 1
676
+ val/LowPass.zeros: 51
677
+
678
+ val/MaskLowMagnitudes.db_cutoff: !!python/tuple
679
+ - uniform
680
+ - -10
681
+ - 10
682
+ val/MaskLowMagnitudes.name: null
683
+ val/MaskLowMagnitudes.prob: 1
684
+
685
+ val/MuLawQuantization.channels: !!python/tuple
686
+ - choice
687
+ - - 8
688
+ - 32
689
+ - 128
690
+ - 256
691
+ - 1024
692
+ val/MuLawQuantization.name: null
693
+ val/MuLawQuantization.prob: 1.0
694
+
695
+ val/NoiseFloor.db: !!python/tuple
696
+ - const
697
+ - -50.0
698
+ val/NoiseFloor.name: null
699
+ val/NoiseFloor.prob: 1.0
700
+
701
+ val/Quantization.channels: !!python/tuple
702
+ - choice
703
+ - - 8
704
+ - 32
705
+ - 128
706
+ - 256
707
+ - 1024
708
+ val/Quantization.name: null
709
+ val/Quantization.prob: 1.0
710
+
711
+ val/Repeat.n_repeat: 1
712
+ val/Repeat.name: null
713
+ val/Repeat.prob: 1.0
714
+
715
+ val/RepeatUpTo.max_repeat: 5
716
+ val/RepeatUpTo.name: null
717
+ val/RepeatUpTo.prob: 1.0
718
+ val/RepeatUpTo.weights: null
719
+
720
+ val/RescaleAudio.name: null
721
+ val/RescaleAudio.prob: 1
722
+ val/RescaleAudio.val: 1.0
723
+
724
+ val/RoomImpulseResponse.drr: !!python/tuple
725
+ - uniform
726
+ - 0.0
727
+ - 30.0
728
+ val/RoomImpulseResponse.duration: 1.0
729
+ val/RoomImpulseResponse.eq_amount: !!python/tuple
730
+ - const
731
+ - 1.0
732
+ val/RoomImpulseResponse.n_bands: 6
733
+ val/RoomImpulseResponse.name: null
734
+ val/RoomImpulseResponse.offset: 0.0
735
+ val/RoomImpulseResponse.prob: 1.0
736
+ val/RoomImpulseResponse.sources: null
737
+ val/RoomImpulseResponse.use_original_phase: false
738
+ val/RoomImpulseResponse.weights: null
739
+
740
+ val/ShiftPhase.name: null
741
+ val/ShiftPhase.prob: 1
742
+ val/ShiftPhase.shift: !!python/tuple
743
+ - uniform
744
+ - -3.141592653589793
745
+ - 3.141592653589793
746
+
747
+ val/Silence.name: null
748
+ val/Silence.prob: 0.1
749
+
750
+ val/Smoothing.name: null
751
+ val/Smoothing.prob: 1
752
+ val/Smoothing.window_length: !!python/tuple
753
+ - choice
754
+ - - 8
755
+ - 16
756
+ - 32
757
+ - 64
758
+ - 128
759
+ - 256
760
+ - 512
761
+ val/Smoothing.window_type: !!python/tuple
762
+ - const
763
+ - average
764
+
765
+ val/SpectralDenoising.denoise_amount: !!python/tuple
766
+ - uniform
767
+ - 0.8
768
+ - 1.0
769
+ val/SpectralDenoising.eq_amount: !!python/tuple
770
+ - const
771
+ - 1.0
772
+ val/SpectralDenoising.n_bands: 6
773
+ val/SpectralDenoising.n_freq: 3
774
+ val/SpectralDenoising.n_time: 5
775
+ val/SpectralDenoising.name: null
776
+ val/SpectralDenoising.nz_volume: -40
777
+ val/SpectralDenoising.prob: 1
778
+
779
+ val/TimeMask.name: null
780
+ val/TimeMask.prob: 1
781
+ val/TimeMask.t_center: !!python/tuple
782
+ - uniform
783
+ - 0.0
784
+ - 1.0
785
+ val/TimeMask.t_width: !!python/tuple
786
+ - const
787
+ - 0.025
788
+
789
+ val/TimeNoise.name: null
790
+ val/TimeNoise.prob: 1
791
+ val/TimeNoise.t_center: !!python/tuple
792
+ - uniform
793
+ - 0.0
794
+ - 1.0
795
+ val/TimeNoise.t_width: !!python/tuple
796
+ - const
797
+ - 0.025
798
+
799
+ val/VolumeChange.db: !!python/tuple
800
+ - uniform
801
+ - -12.0
802
+ - 0.0
803
+ val/VolumeChange.name: null
804
+ val/VolumeChange.prob: 1.0
805
+
806
+ val/VolumeNorm.db: !!python/tuple
807
+ - const
808
+ - -24
809
+ val/VolumeNorm.name: null
810
+ val/VolumeNorm.prob: 1.0
811
+
812
+ val_freq: 500
813
+
814
+ val_idx:
815
+ - 0
816
+ - 1
817
+ - 2
818
+ - 3
819
+ - 4
820
+ - 5
821
+ - 6
822
+ - 7
823
+ - 8
824
+ - 9
825
+
runs/boleros/coarse/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4cab5127c211565b6c408d4affe734f07503935828422dcc958ff7d4c7cf4d5
3
+ size 1343718241
runs/boleros/coarse/model.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ OptimizedModule(
2
+ 335.894M params.
3
+ (_orig_mod): VampNet(
4
+ 335.894M params.
5
+ (embedding): CodebookEmbedding(
6
+ 0.042M params.
7
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
8
+ (out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
9
+ )
10
+ (transformer): TransformerStack(
11
+ 330.600M params.
12
+ (layers): ModuleList(
13
+ (0): TransformerLayer(
14
+ 16.531M params.
15
+ (norm_1): RMSNorm( 0.001M params.)
16
+ (film_1): FiLM( 0.000M params.)
17
+ (self_attn): MultiHeadRelativeAttention(
18
+ 6.616M params.
19
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
21
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
22
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
23
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
24
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
25
+ )
26
+ (norm_3): RMSNorm( 0.001M params.)
27
+ (film_3): FiLM( 0.000M params.)
28
+ (feed_forward): FeedForward(
29
+ 9.912M params.
30
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
31
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
32
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
33
+ (act): GatedGELU(
34
+ 0.000M params.
35
+ (gelu): NewGELU( 0.000M params.)
36
+ )
37
+ )
38
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
39
+ )
40
+ (1-19): 19 x TransformerLayer(
41
+ 16.530M params.
42
+ (norm_1): RMSNorm( 0.001M params.)
43
+ (film_1): FiLM( 0.000M params.)
44
+ (self_attn): MultiHeadRelativeAttention(
45
+ 6.615M params.
46
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
48
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
49
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
50
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
51
+ )
52
+ (norm_3): RMSNorm( 0.001M params.)
53
+ (film_3): FiLM( 0.000M params.)
54
+ (feed_forward): FeedForward(
55
+ 9.912M params.
56
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
57
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
58
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
59
+ (act): GatedGELU(
60
+ 0.000M params.
61
+ (gelu): NewGELU( 0.000M params.)
62
+ )
63
+ )
64
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
65
+ )
66
+ )
67
+ (norm): RMSNorm( 0.001M params.)
68
+ )
69
+ (classifier): SequentialWithFiLM(
70
+ 5.251M params.
71
+ (layers): ModuleList(
72
+ (0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
73
+ )
74
+ )
75
+ )
76
+ )
runs/choir/c2f/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fd753f116f3778c23380ab3d04de9c2525a7b80adb67290042abf7b55415da5
3
+ size 1111127537
runs/choir/coarse/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c29a1dfe20e7ddcd6dc8a6a41015d3d63447d4363fde3c978684196b0e12b82d
3
+ size 1343718241
runs/knower/c2f/args.yml ADDED
@@ -0,0 +1,824 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 3.0
14
+ AudioDataset.loudness_cutoff: -40.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: /data/
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ BackgroundNoise.eq_amount: !!python/tuple
33
+ - const
34
+ - 1.0
35
+ BackgroundNoise.loudness_cutoff: null
36
+ BackgroundNoise.n_bands: 3
37
+ BackgroundNoise.name: null
38
+ BackgroundNoise.prob: 1.0
39
+ BackgroundNoise.snr: !!python/tuple
40
+ - uniform
41
+ - 10.0
42
+ - 30.0
43
+ BackgroundNoise.sources: null
44
+ BackgroundNoise.weights: null
45
+
46
+ BaseTransform.keys: []
47
+ BaseTransform.name: null
48
+ BaseTransform.prob: 1.0
49
+
50
+ ClippingDistortion.name: null
51
+ ClippingDistortion.perc: !!python/tuple
52
+ - uniform
53
+ - 0.0
54
+ - 0.1
55
+ ClippingDistortion.prob: 1.0
56
+
57
+ CorruptPhase.name: null
58
+ CorruptPhase.prob: 1
59
+ CorruptPhase.scale: !!python/tuple
60
+ - uniform
61
+ - 0
62
+ - 3.141592653589793
63
+
64
+ CrossEntropyLoss.ignore_index: -100
65
+ CrossEntropyLoss.label_smoothing: 0.1
66
+ CrossEntropyLoss.reduce: null
67
+ CrossEntropyLoss.reduction: mean
68
+ CrossEntropyLoss.size_average: null
69
+
70
+ CrossTalk.loudness_cutoff: -40
71
+ CrossTalk.name: null
72
+ CrossTalk.prob: 1.0
73
+ CrossTalk.snr: !!python/tuple
74
+ - uniform
75
+ - 0.0
76
+ - 10.0
77
+ CrossTalk.sources: null
78
+ CrossTalk.weights: null
79
+
80
+ Equalizer.eq_amount: !!python/tuple
81
+ - const
82
+ - 1.0
83
+ Equalizer.n_bands: 6
84
+ Equalizer.name: null
85
+ Equalizer.prob: 1.0
86
+
87
+ FrequencyMask.f_center: !!python/tuple
88
+ - uniform
89
+ - 0.0
90
+ - 1.0
91
+ FrequencyMask.f_width: !!python/tuple
92
+ - const
93
+ - 0.1
94
+ FrequencyMask.name: null
95
+ FrequencyMask.prob: 1
96
+
97
+ FrequencyNoise.f_center: !!python/tuple
98
+ - uniform
99
+ - 0.0
100
+ - 1.0
101
+ FrequencyNoise.f_width: !!python/tuple
102
+ - const
103
+ - 0.1
104
+ FrequencyNoise.name: null
105
+ FrequencyNoise.prob: 1
106
+
107
+ GlobalVolumeNorm.db: !!python/tuple
108
+ - const
109
+ - -24
110
+ GlobalVolumeNorm.name: null
111
+ GlobalVolumeNorm.prob: 1.0
112
+
113
+ HighPass.cutoff: !!python/tuple
114
+ - choice
115
+ - - 50
116
+ - 100
117
+ - 250
118
+ - 500
119
+ - 1000
120
+ HighPass.name: null
121
+ HighPass.prob: 1
122
+ HighPass.zeros: 51
123
+
124
+ InvertPhase.name: null
125
+ InvertPhase.prob: 1
126
+
127
+ LowPass.cutoff: !!python/tuple
128
+ - choice
129
+ - - 4000
130
+ - 8000
131
+ - 16000
132
+ LowPass.name: null
133
+ LowPass.prob: 1
134
+ LowPass.zeros: 51
135
+
136
+ MaskLowMagnitudes.db_cutoff: !!python/tuple
137
+ - uniform
138
+ - -10
139
+ - 10
140
+ MaskLowMagnitudes.name: null
141
+ MaskLowMagnitudes.prob: 1
142
+
143
+ MuLawQuantization.channels: !!python/tuple
144
+ - choice
145
+ - - 8
146
+ - 32
147
+ - 128
148
+ - 256
149
+ - 1024
150
+ MuLawQuantization.name: null
151
+ MuLawQuantization.prob: 1.0
152
+
153
+ NoamScheduler.d_model: 512
154
+ NoamScheduler.factor: 2.0
155
+ NoamScheduler.warmup: 500
156
+
157
+ NoiseFloor.db: !!python/tuple
158
+ - const
159
+ - -50.0
160
+ NoiseFloor.name: null
161
+ NoiseFloor.prob: 1.0
162
+
163
+ Quantization.channels: !!python/tuple
164
+ - choice
165
+ - - 8
166
+ - 32
167
+ - 128
168
+ - 256
169
+ - 1024
170
+ Quantization.name: null
171
+ Quantization.prob: 1.0
172
+
173
+ Repeat.n_repeat: 1
174
+ Repeat.name: null
175
+ Repeat.prob: 1.0
176
+
177
+ RepeatUpTo.max_repeat: 5
178
+ RepeatUpTo.name: null
179
+ RepeatUpTo.prob: 1.0
180
+ RepeatUpTo.weights: null
181
+
182
+ RescaleAudio.name: null
183
+ RescaleAudio.prob: 1
184
+ RescaleAudio.val: 1.0
185
+
186
+ RoomImpulseResponse.drr: !!python/tuple
187
+ - uniform
188
+ - 0.0
189
+ - 30.0
190
+ RoomImpulseResponse.duration: 1.0
191
+ RoomImpulseResponse.eq_amount: !!python/tuple
192
+ - const
193
+ - 1.0
194
+ RoomImpulseResponse.n_bands: 6
195
+ RoomImpulseResponse.name: null
196
+ RoomImpulseResponse.offset: 0.0
197
+ RoomImpulseResponse.prob: 1.0
198
+ RoomImpulseResponse.sources: null
199
+ RoomImpulseResponse.use_original_phase: false
200
+ RoomImpulseResponse.weights: null
201
+
202
+ ShiftPhase.name: null
203
+ ShiftPhase.prob: 1
204
+ ShiftPhase.shift: !!python/tuple
205
+ - uniform
206
+ - -3.141592653589793
207
+ - 3.141592653589793
208
+
209
+ Silence.name: null
210
+ Silence.prob: 0.1
211
+
212
+ Smoothing.name: null
213
+ Smoothing.prob: 1
214
+ Smoothing.window_length: !!python/tuple
215
+ - choice
216
+ - - 8
217
+ - 16
218
+ - 32
219
+ - 64
220
+ - 128
221
+ - 256
222
+ - 512
223
+ Smoothing.window_type: !!python/tuple
224
+ - const
225
+ - average
226
+
227
+ SpectralDenoising.denoise_amount: !!python/tuple
228
+ - uniform
229
+ - 0.8
230
+ - 1.0
231
+ SpectralDenoising.eq_amount: !!python/tuple
232
+ - const
233
+ - 1.0
234
+ SpectralDenoising.n_bands: 6
235
+ SpectralDenoising.n_freq: 3
236
+ SpectralDenoising.n_time: 5
237
+ SpectralDenoising.name: null
238
+ SpectralDenoising.nz_volume: -40
239
+ SpectralDenoising.prob: 1
240
+
241
+ TimeMask.name: null
242
+ TimeMask.prob: 1
243
+ TimeMask.t_center: !!python/tuple
244
+ - uniform
245
+ - 0.0
246
+ - 1.0
247
+ TimeMask.t_width: !!python/tuple
248
+ - const
249
+ - 0.025
250
+
251
+ TimeNoise.name: null
252
+ TimeNoise.prob: 1
253
+ TimeNoise.t_center: !!python/tuple
254
+ - uniform
255
+ - 0.0
256
+ - 1.0
257
+ TimeNoise.t_width: !!python/tuple
258
+ - const
259
+ - 0.025
260
+
261
+ VampNet.dropout: 0.1
262
+ VampNet.embedding_dim: 1280
263
+ VampNet.flash_attn: false
264
+ VampNet.latent_dim: 8
265
+ VampNet.n_codebooks: 14
266
+ VampNet.n_conditioning_codebooks: 4
267
+ VampNet.n_heads: 20
268
+ VampNet.n_layers: 16
269
+ VampNet.noise_mode: mask
270
+ VampNet.r_cond_dim: 0
271
+ VampNet.vocab_size: 1024
272
+
273
+ VolumeChange.db: !!python/tuple
274
+ - uniform
275
+ - -12.0
276
+ - 0.0
277
+ VolumeChange.name: null
278
+ VolumeChange.prob: 1.0
279
+
280
+ VolumeNorm.db: !!python/tuple
281
+ - const
282
+ - -24
283
+ VolumeNorm.name: null
284
+ VolumeNorm.prob: 1.0
285
+
286
+ amp: false
287
+
288
+ args.debug: true
289
+ args.load: conf/generated/knower/c2f.yml
290
+ args.save: null
291
+
292
+ batch_size: 6
293
+
294
+ codec_ckpt: ./models/vampnet/codec.pth
295
+
296
+ fine_tune: true
297
+
298
+ fine_tune_checkpoint: ./models/vampnet/c2f.pth
299
+
300
+ grad_clip_val: 5.0
301
+
302
+ num_iters: 500000
303
+
304
+ num_workers: 7
305
+
306
+ resume: true
307
+
308
+ sample_freq: 1000
309
+
310
+ save_iters:
311
+ - 10000
312
+ - 20000
313
+ - 30000
314
+ - 40000
315
+ - 50000
316
+
317
+ save_path: ./runs/knower/c2f
318
+
319
+ seed: 0
320
+
321
+ tag: latest
322
+
323
+ train/AudioDataset.aligned: false
324
+ train/AudioDataset.duration: 3.0
325
+ train/AudioDataset.loudness_cutoff: -40.0
326
+ train/AudioDataset.n_examples: 100000000
327
+ train/AudioDataset.num_channels: 1
328
+ train/AudioDataset.offset: null
329
+ train/AudioDataset.shuffle_loaders: false
330
+ train/AudioDataset.without_replacement: false
331
+
332
+ train/AudioLoader.sources:
333
+ - /media/CHONK/hugo/knower
334
+
335
+ train/BackgroundNoise.eq_amount: !!python/tuple
336
+ - const
337
+ - 1.0
338
+ train/BackgroundNoise.loudness_cutoff: null
339
+ train/BackgroundNoise.n_bands: 3
340
+ train/BackgroundNoise.name: null
341
+ train/BackgroundNoise.prob: 1.0
342
+ train/BackgroundNoise.snr: !!python/tuple
343
+ - uniform
344
+ - 10.0
345
+ - 30.0
346
+ train/BackgroundNoise.sources: null
347
+ train/BackgroundNoise.weights: null
348
+
349
+ train/BaseTransform.keys: []
350
+ train/BaseTransform.name: null
351
+ train/BaseTransform.prob: 1.0
352
+
353
+ train/ClippingDistortion.name: null
354
+ train/ClippingDistortion.perc: !!python/tuple
355
+ - uniform
356
+ - 0.0
357
+ - 0.1
358
+ train/ClippingDistortion.prob: 1.0
359
+
360
+ train/CorruptPhase.name: null
361
+ train/CorruptPhase.prob: 1
362
+ train/CorruptPhase.scale: !!python/tuple
363
+ - uniform
364
+ - 0
365
+ - 3.141592653589793
366
+
367
+ train/CrossTalk.loudness_cutoff: -40
368
+ train/CrossTalk.name: null
369
+ train/CrossTalk.prob: 1.0
370
+ train/CrossTalk.snr: !!python/tuple
371
+ - uniform
372
+ - 0.0
373
+ - 10.0
374
+ train/CrossTalk.sources: null
375
+ train/CrossTalk.weights: null
376
+
377
+ train/Equalizer.eq_amount: !!python/tuple
378
+ - const
379
+ - 1.0
380
+ train/Equalizer.n_bands: 6
381
+ train/Equalizer.name: null
382
+ train/Equalizer.prob: 1.0
383
+
384
+ train/FrequencyMask.f_center: !!python/tuple
385
+ - uniform
386
+ - 0.0
387
+ - 1.0
388
+ train/FrequencyMask.f_width: !!python/tuple
389
+ - const
390
+ - 0.1
391
+ train/FrequencyMask.name: null
392
+ train/FrequencyMask.prob: 1
393
+
394
+ train/FrequencyNoise.f_center: !!python/tuple
395
+ - uniform
396
+ - 0.0
397
+ - 1.0
398
+ train/FrequencyNoise.f_width: !!python/tuple
399
+ - const
400
+ - 0.1
401
+ train/FrequencyNoise.name: null
402
+ train/FrequencyNoise.prob: 1
403
+
404
+ train/GlobalVolumeNorm.db: !!python/tuple
405
+ - const
406
+ - -24
407
+ train/GlobalVolumeNorm.name: null
408
+ train/GlobalVolumeNorm.prob: 1.0
409
+
410
+ train/HighPass.cutoff: !!python/tuple
411
+ - choice
412
+ - - 50
413
+ - 100
414
+ - 250
415
+ - 500
416
+ - 1000
417
+ train/HighPass.name: null
418
+ train/HighPass.prob: 1
419
+ train/HighPass.zeros: 51
420
+
421
+ train/InvertPhase.name: null
422
+ train/InvertPhase.prob: 1
423
+
424
+ train/LowPass.cutoff: !!python/tuple
425
+ - choice
426
+ - - 4000
427
+ - 8000
428
+ - 16000
429
+ train/LowPass.name: null
430
+ train/LowPass.prob: 1
431
+ train/LowPass.zeros: 51
432
+
433
+ train/MaskLowMagnitudes.db_cutoff: !!python/tuple
434
+ - uniform
435
+ - -10
436
+ - 10
437
+ train/MaskLowMagnitudes.name: null
438
+ train/MaskLowMagnitudes.prob: 1
439
+
440
+ train/MuLawQuantization.channels: !!python/tuple
441
+ - choice
442
+ - - 8
443
+ - 32
444
+ - 128
445
+ - 256
446
+ - 1024
447
+ train/MuLawQuantization.name: null
448
+ train/MuLawQuantization.prob: 1.0
449
+
450
+ train/NoiseFloor.db: !!python/tuple
451
+ - const
452
+ - -50.0
453
+ train/NoiseFloor.name: null
454
+ train/NoiseFloor.prob: 1.0
455
+
456
+ train/Quantization.channels: !!python/tuple
457
+ - choice
458
+ - - 8
459
+ - 32
460
+ - 128
461
+ - 256
462
+ - 1024
463
+ train/Quantization.name: null
464
+ train/Quantization.prob: 1.0
465
+
466
+ train/Repeat.n_repeat: 1
467
+ train/Repeat.name: null
468
+ train/Repeat.prob: 1.0
469
+
470
+ train/RepeatUpTo.max_repeat: 5
471
+ train/RepeatUpTo.name: null
472
+ train/RepeatUpTo.prob: 1.0
473
+ train/RepeatUpTo.weights: null
474
+
475
+ train/RescaleAudio.name: null
476
+ train/RescaleAudio.prob: 1
477
+ train/RescaleAudio.val: 1.0
478
+
479
+ train/RoomImpulseResponse.drr: !!python/tuple
480
+ - uniform
481
+ - 0.0
482
+ - 30.0
483
+ train/RoomImpulseResponse.duration: 1.0
484
+ train/RoomImpulseResponse.eq_amount: !!python/tuple
485
+ - const
486
+ - 1.0
487
+ train/RoomImpulseResponse.n_bands: 6
488
+ train/RoomImpulseResponse.name: null
489
+ train/RoomImpulseResponse.offset: 0.0
490
+ train/RoomImpulseResponse.prob: 1.0
491
+ train/RoomImpulseResponse.sources: null
492
+ train/RoomImpulseResponse.use_original_phase: false
493
+ train/RoomImpulseResponse.weights: null
494
+
495
+ train/ShiftPhase.name: null
496
+ train/ShiftPhase.prob: 1
497
+ train/ShiftPhase.shift: !!python/tuple
498
+ - uniform
499
+ - -3.141592653589793
500
+ - 3.141592653589793
501
+
502
+ train/Silence.name: null
503
+ train/Silence.prob: 0.1
504
+
505
+ train/Smoothing.name: null
506
+ train/Smoothing.prob: 1
507
+ train/Smoothing.window_length: !!python/tuple
508
+ - choice
509
+ - - 8
510
+ - 16
511
+ - 32
512
+ - 64
513
+ - 128
514
+ - 256
515
+ - 512
516
+ train/Smoothing.window_type: !!python/tuple
517
+ - const
518
+ - average
519
+
520
+ train/SpectralDenoising.denoise_amount: !!python/tuple
521
+ - uniform
522
+ - 0.8
523
+ - 1.0
524
+ train/SpectralDenoising.eq_amount: !!python/tuple
525
+ - const
526
+ - 1.0
527
+ train/SpectralDenoising.n_bands: 6
528
+ train/SpectralDenoising.n_freq: 3
529
+ train/SpectralDenoising.n_time: 5
530
+ train/SpectralDenoising.name: null
531
+ train/SpectralDenoising.nz_volume: -40
532
+ train/SpectralDenoising.prob: 1
533
+
534
+ train/TimeMask.name: null
535
+ train/TimeMask.prob: 1
536
+ train/TimeMask.t_center: !!python/tuple
537
+ - uniform
538
+ - 0.0
539
+ - 1.0
540
+ train/TimeMask.t_width: !!python/tuple
541
+ - const
542
+ - 0.025
543
+
544
+ train/TimeNoise.name: null
545
+ train/TimeNoise.prob: 1
546
+ train/TimeNoise.t_center: !!python/tuple
547
+ - uniform
548
+ - 0.0
549
+ - 1.0
550
+ train/TimeNoise.t_width: !!python/tuple
551
+ - const
552
+ - 0.025
553
+
554
+ train/VolumeChange.db: !!python/tuple
555
+ - uniform
556
+ - -12.0
557
+ - 0.0
558
+ train/VolumeChange.name: null
559
+ train/VolumeChange.prob: 1.0
560
+
561
+ train/VolumeNorm.db: !!python/tuple
562
+ - const
563
+ - -24
564
+ train/VolumeNorm.name: null
565
+ train/VolumeNorm.prob: 1.0
566
+
567
+ val/AudioDataset.aligned: false
568
+ val/AudioDataset.duration: 3.0
569
+ val/AudioDataset.loudness_cutoff: -40.0
570
+ val/AudioDataset.n_examples: 500
571
+ val/AudioDataset.num_channels: 1
572
+ val/AudioDataset.offset: null
573
+ val/AudioDataset.shuffle_loaders: false
574
+ val/AudioDataset.without_replacement: false
575
+
576
+ val/AudioLoader.sources:
577
+ - /media/CHONK/hugo/knower
578
+
579
+ val/BackgroundNoise.eq_amount: !!python/tuple
580
+ - const
581
+ - 1.0
582
+ val/BackgroundNoise.loudness_cutoff: null
583
+ val/BackgroundNoise.n_bands: 3
584
+ val/BackgroundNoise.name: null
585
+ val/BackgroundNoise.prob: 1.0
586
+ val/BackgroundNoise.snr: !!python/tuple
587
+ - uniform
588
+ - 10.0
589
+ - 30.0
590
+ val/BackgroundNoise.sources: null
591
+ val/BackgroundNoise.weights: null
592
+
593
+ val/BaseTransform.keys: []
594
+ val/BaseTransform.name: null
595
+ val/BaseTransform.prob: 1.0
596
+
597
+ val/ClippingDistortion.name: null
598
+ val/ClippingDistortion.perc: !!python/tuple
599
+ - uniform
600
+ - 0.0
601
+ - 0.1
602
+ val/ClippingDistortion.prob: 1.0
603
+
604
+ val/CorruptPhase.name: null
605
+ val/CorruptPhase.prob: 1
606
+ val/CorruptPhase.scale: !!python/tuple
607
+ - uniform
608
+ - 0
609
+ - 3.141592653589793
610
+
611
+ val/CrossTalk.loudness_cutoff: -40
612
+ val/CrossTalk.name: null
613
+ val/CrossTalk.prob: 1.0
614
+ val/CrossTalk.snr: !!python/tuple
615
+ - uniform
616
+ - 0.0
617
+ - 10.0
618
+ val/CrossTalk.sources: null
619
+ val/CrossTalk.weights: null
620
+
621
+ val/Equalizer.eq_amount: !!python/tuple
622
+ - const
623
+ - 1.0
624
+ val/Equalizer.n_bands: 6
625
+ val/Equalizer.name: null
626
+ val/Equalizer.prob: 1.0
627
+
628
+ val/FrequencyMask.f_center: !!python/tuple
629
+ - uniform
630
+ - 0.0
631
+ - 1.0
632
+ val/FrequencyMask.f_width: !!python/tuple
633
+ - const
634
+ - 0.1
635
+ val/FrequencyMask.name: null
636
+ val/FrequencyMask.prob: 1
637
+
638
+ val/FrequencyNoise.f_center: !!python/tuple
639
+ - uniform
640
+ - 0.0
641
+ - 1.0
642
+ val/FrequencyNoise.f_width: !!python/tuple
643
+ - const
644
+ - 0.1
645
+ val/FrequencyNoise.name: null
646
+ val/FrequencyNoise.prob: 1
647
+
648
+ val/GlobalVolumeNorm.db: !!python/tuple
649
+ - const
650
+ - -24
651
+ val/GlobalVolumeNorm.name: null
652
+ val/GlobalVolumeNorm.prob: 1.0
653
+
654
+ val/HighPass.cutoff: !!python/tuple
655
+ - choice
656
+ - - 50
657
+ - 100
658
+ - 250
659
+ - 500
660
+ - 1000
661
+ val/HighPass.name: null
662
+ val/HighPass.prob: 1
663
+ val/HighPass.zeros: 51
664
+
665
+ val/InvertPhase.name: null
666
+ val/InvertPhase.prob: 1
667
+
668
+ val/LowPass.cutoff: !!python/tuple
669
+ - choice
670
+ - - 4000
671
+ - 8000
672
+ - 16000
673
+ val/LowPass.name: null
674
+ val/LowPass.prob: 1
675
+ val/LowPass.zeros: 51
676
+
677
+ val/MaskLowMagnitudes.db_cutoff: !!python/tuple
678
+ - uniform
679
+ - -10
680
+ - 10
681
+ val/MaskLowMagnitudes.name: null
682
+ val/MaskLowMagnitudes.prob: 1
683
+
684
+ val/MuLawQuantization.channels: !!python/tuple
685
+ - choice
686
+ - - 8
687
+ - 32
688
+ - 128
689
+ - 256
690
+ - 1024
691
+ val/MuLawQuantization.name: null
692
+ val/MuLawQuantization.prob: 1.0
693
+
694
+ val/NoiseFloor.db: !!python/tuple
695
+ - const
696
+ - -50.0
697
+ val/NoiseFloor.name: null
698
+ val/NoiseFloor.prob: 1.0
699
+
700
+ val/Quantization.channels: !!python/tuple
701
+ - choice
702
+ - - 8
703
+ - 32
704
+ - 128
705
+ - 256
706
+ - 1024
707
+ val/Quantization.name: null
708
+ val/Quantization.prob: 1.0
709
+
710
+ val/Repeat.n_repeat: 1
711
+ val/Repeat.name: null
712
+ val/Repeat.prob: 1.0
713
+
714
+ val/RepeatUpTo.max_repeat: 5
715
+ val/RepeatUpTo.name: null
716
+ val/RepeatUpTo.prob: 1.0
717
+ val/RepeatUpTo.weights: null
718
+
719
+ val/RescaleAudio.name: null
720
+ val/RescaleAudio.prob: 1
721
+ val/RescaleAudio.val: 1.0
722
+
723
+ val/RoomImpulseResponse.drr: !!python/tuple
724
+ - uniform
725
+ - 0.0
726
+ - 30.0
727
+ val/RoomImpulseResponse.duration: 1.0
728
+ val/RoomImpulseResponse.eq_amount: !!python/tuple
729
+ - const
730
+ - 1.0
731
+ val/RoomImpulseResponse.n_bands: 6
732
+ val/RoomImpulseResponse.name: null
733
+ val/RoomImpulseResponse.offset: 0.0
734
+ val/RoomImpulseResponse.prob: 1.0
735
+ val/RoomImpulseResponse.sources: null
736
+ val/RoomImpulseResponse.use_original_phase: false
737
+ val/RoomImpulseResponse.weights: null
738
+
739
+ val/ShiftPhase.name: null
740
+ val/ShiftPhase.prob: 1
741
+ val/ShiftPhase.shift: !!python/tuple
742
+ - uniform
743
+ - -3.141592653589793
744
+ - 3.141592653589793
745
+
746
+ val/Silence.name: null
747
+ val/Silence.prob: 0.1
748
+
749
+ val/Smoothing.name: null
750
+ val/Smoothing.prob: 1
751
+ val/Smoothing.window_length: !!python/tuple
752
+ - choice
753
+ - - 8
754
+ - 16
755
+ - 32
756
+ - 64
757
+ - 128
758
+ - 256
759
+ - 512
760
+ val/Smoothing.window_type: !!python/tuple
761
+ - const
762
+ - average
763
+
764
+ val/SpectralDenoising.denoise_amount: !!python/tuple
765
+ - uniform
766
+ - 0.8
767
+ - 1.0
768
+ val/SpectralDenoising.eq_amount: !!python/tuple
769
+ - const
770
+ - 1.0
771
+ val/SpectralDenoising.n_bands: 6
772
+ val/SpectralDenoising.n_freq: 3
773
+ val/SpectralDenoising.n_time: 5
774
+ val/SpectralDenoising.name: null
775
+ val/SpectralDenoising.nz_volume: -40
776
+ val/SpectralDenoising.prob: 1
777
+
778
+ val/TimeMask.name: null
779
+ val/TimeMask.prob: 1
780
+ val/TimeMask.t_center: !!python/tuple
781
+ - uniform
782
+ - 0.0
783
+ - 1.0
784
+ val/TimeMask.t_width: !!python/tuple
785
+ - const
786
+ - 0.025
787
+
788
+ val/TimeNoise.name: null
789
+ val/TimeNoise.prob: 1
790
+ val/TimeNoise.t_center: !!python/tuple
791
+ - uniform
792
+ - 0.0
793
+ - 1.0
794
+ val/TimeNoise.t_width: !!python/tuple
795
+ - const
796
+ - 0.025
797
+
798
+ val/VolumeChange.db: !!python/tuple
799
+ - uniform
800
+ - -12.0
801
+ - 0.0
802
+ val/VolumeChange.name: null
803
+ val/VolumeChange.prob: 1.0
804
+
805
+ val/VolumeNorm.db: !!python/tuple
806
+ - const
807
+ - -24
808
+ val/VolumeNorm.name: null
809
+ val/VolumeNorm.prob: 1.0
810
+
811
+ val_freq: 500
812
+
813
+ val_idx:
814
+ - 0
815
+ - 1
816
+ - 2
817
+ - 3
818
+ - 4
819
+ - 5
820
+ - 6
821
+ - 7
822
+ - 8
823
+ - 9
824
+
runs/knower/c2f/best/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fcf94cab2f8b30d063eb1c176b6e23ba41674d1db37183fc75250b09c536eec1
3
+ size 1111127537
runs/knower/c2f/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34aaa7eeb26bf583637c5a1f4c7b7de23586ee60817bc9e87203442b5621699b
3
+ size 1111127537
runs/knower/c2f/model.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ OptimizedModule(
2
+ 277.753M params.
3
+ (_orig_mod): VampNet(
4
+ 277.753M params.
5
+ (embedding): CodebookEmbedding(
6
+ 0.145M params.
7
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
8
+ (out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
9
+ )
10
+ (transformer): TransformerStack(
11
+ 264.481M params.
12
+ (layers): ModuleList(
13
+ (0): TransformerLayer(
14
+ 16.531M params.
15
+ (norm_1): RMSNorm( 0.001M params.)
16
+ (film_1): FiLM( 0.000M params.)
17
+ (self_attn): MultiHeadRelativeAttention(
18
+ 6.616M params.
19
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
21
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
22
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
23
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
24
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
25
+ )
26
+ (norm_3): RMSNorm( 0.001M params.)
27
+ (film_3): FiLM( 0.000M params.)
28
+ (feed_forward): FeedForward(
29
+ 9.912M params.
30
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
31
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
32
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
33
+ (act): GatedGELU(
34
+ 0.000M params.
35
+ (gelu): NewGELU( 0.000M params.)
36
+ )
37
+ )
38
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
39
+ )
40
+ (1-15): 15 x TransformerLayer(
41
+ 16.530M params.
42
+ (norm_1): RMSNorm( 0.001M params.)
43
+ (film_1): FiLM( 0.000M params.)
44
+ (self_attn): MultiHeadRelativeAttention(
45
+ 6.615M params.
46
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
48
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
49
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
50
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
51
+ )
52
+ (norm_3): RMSNorm( 0.001M params.)
53
+ (film_3): FiLM( 0.000M params.)
54
+ (feed_forward): FeedForward(
55
+ 9.912M params.
56
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
57
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
58
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
59
+ (act): GatedGELU(
60
+ 0.000M params.
61
+ (gelu): NewGELU( 0.000M params.)
62
+ )
63
+ )
64
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
65
+ )
66
+ )
67
+ (norm): RMSNorm( 0.001M params.)
68
+ )
69
+ (classifier): SequentialWithFiLM(
70
+ 13.128M params.
71
+ (layers): ModuleList(
72
+ (0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
73
+ )
74
+ )
75
+ )
76
+ )
runs/knower/coarse/args.yml ADDED
@@ -0,0 +1,824 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 10.0
14
+ AudioDataset.loudness_cutoff: -30.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: /data/
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ BackgroundNoise.eq_amount: !!python/tuple
33
+ - const
34
+ - 1.0
35
+ BackgroundNoise.loudness_cutoff: null
36
+ BackgroundNoise.n_bands: 3
37
+ BackgroundNoise.name: null
38
+ BackgroundNoise.prob: 1.0
39
+ BackgroundNoise.snr: !!python/tuple
40
+ - uniform
41
+ - 10.0
42
+ - 30.0
43
+ BackgroundNoise.sources: null
44
+ BackgroundNoise.weights: null
45
+
46
+ BaseTransform.keys: []
47
+ BaseTransform.name: null
48
+ BaseTransform.prob: 1.0
49
+
50
+ ClippingDistortion.name: null
51
+ ClippingDistortion.perc: !!python/tuple
52
+ - uniform
53
+ - 0.0
54
+ - 0.1
55
+ ClippingDistortion.prob: 1.0
56
+
57
+ CorruptPhase.name: null
58
+ CorruptPhase.prob: 1
59
+ CorruptPhase.scale: !!python/tuple
60
+ - uniform
61
+ - 0
62
+ - 3.141592653589793
63
+
64
+ CrossEntropyLoss.ignore_index: -100
65
+ CrossEntropyLoss.label_smoothing: 0.1
66
+ CrossEntropyLoss.reduce: null
67
+ CrossEntropyLoss.reduction: mean
68
+ CrossEntropyLoss.size_average: null
69
+
70
+ CrossTalk.loudness_cutoff: -40
71
+ CrossTalk.name: null
72
+ CrossTalk.prob: 1.0
73
+ CrossTalk.snr: !!python/tuple
74
+ - uniform
75
+ - 0.0
76
+ - 10.0
77
+ CrossTalk.sources: null
78
+ CrossTalk.weights: null
79
+
80
+ Equalizer.eq_amount: !!python/tuple
81
+ - const
82
+ - 1.0
83
+ Equalizer.n_bands: 6
84
+ Equalizer.name: null
85
+ Equalizer.prob: 1.0
86
+
87
+ FrequencyMask.f_center: !!python/tuple
88
+ - uniform
89
+ - 0.0
90
+ - 1.0
91
+ FrequencyMask.f_width: !!python/tuple
92
+ - const
93
+ - 0.1
94
+ FrequencyMask.name: null
95
+ FrequencyMask.prob: 1
96
+
97
+ FrequencyNoise.f_center: !!python/tuple
98
+ - uniform
99
+ - 0.0
100
+ - 1.0
101
+ FrequencyNoise.f_width: !!python/tuple
102
+ - const
103
+ - 0.1
104
+ FrequencyNoise.name: null
105
+ FrequencyNoise.prob: 1
106
+
107
+ GlobalVolumeNorm.db: !!python/tuple
108
+ - const
109
+ - -24
110
+ GlobalVolumeNorm.name: null
111
+ GlobalVolumeNorm.prob: 1.0
112
+
113
+ HighPass.cutoff: !!python/tuple
114
+ - choice
115
+ - - 50
116
+ - 100
117
+ - 250
118
+ - 500
119
+ - 1000
120
+ HighPass.name: null
121
+ HighPass.prob: 1
122
+ HighPass.zeros: 51
123
+
124
+ InvertPhase.name: null
125
+ InvertPhase.prob: 1
126
+
127
+ LowPass.cutoff: !!python/tuple
128
+ - choice
129
+ - - 4000
130
+ - 8000
131
+ - 16000
132
+ LowPass.name: null
133
+ LowPass.prob: 1
134
+ LowPass.zeros: 51
135
+
136
+ MaskLowMagnitudes.db_cutoff: !!python/tuple
137
+ - uniform
138
+ - -10
139
+ - 10
140
+ MaskLowMagnitudes.name: null
141
+ MaskLowMagnitudes.prob: 1
142
+
143
+ MuLawQuantization.channels: !!python/tuple
144
+ - choice
145
+ - - 8
146
+ - 32
147
+ - 128
148
+ - 256
149
+ - 1024
150
+ MuLawQuantization.name: null
151
+ MuLawQuantization.prob: 1.0
152
+
153
+ NoamScheduler.d_model: 512
154
+ NoamScheduler.factor: 2.0
155
+ NoamScheduler.warmup: 500
156
+
157
+ NoiseFloor.db: !!python/tuple
158
+ - const
159
+ - -50.0
160
+ NoiseFloor.name: null
161
+ NoiseFloor.prob: 1.0
162
+
163
+ Quantization.channels: !!python/tuple
164
+ - choice
165
+ - - 8
166
+ - 32
167
+ - 128
168
+ - 256
169
+ - 1024
170
+ Quantization.name: null
171
+ Quantization.prob: 1.0
172
+
173
+ Repeat.n_repeat: 1
174
+ Repeat.name: null
175
+ Repeat.prob: 1.0
176
+
177
+ RepeatUpTo.max_repeat: 5
178
+ RepeatUpTo.name: null
179
+ RepeatUpTo.prob: 1.0
180
+ RepeatUpTo.weights: null
181
+
182
+ RescaleAudio.name: null
183
+ RescaleAudio.prob: 1
184
+ RescaleAudio.val: 1.0
185
+
186
+ RoomImpulseResponse.drr: !!python/tuple
187
+ - uniform
188
+ - 0.0
189
+ - 30.0
190
+ RoomImpulseResponse.duration: 1.0
191
+ RoomImpulseResponse.eq_amount: !!python/tuple
192
+ - const
193
+ - 1.0
194
+ RoomImpulseResponse.n_bands: 6
195
+ RoomImpulseResponse.name: null
196
+ RoomImpulseResponse.offset: 0.0
197
+ RoomImpulseResponse.prob: 1.0
198
+ RoomImpulseResponse.sources: null
199
+ RoomImpulseResponse.use_original_phase: false
200
+ RoomImpulseResponse.weights: null
201
+
202
+ ShiftPhase.name: null
203
+ ShiftPhase.prob: 1
204
+ ShiftPhase.shift: !!python/tuple
205
+ - uniform
206
+ - -3.141592653589793
207
+ - 3.141592653589793
208
+
209
+ Silence.name: null
210
+ Silence.prob: 0.1
211
+
212
+ Smoothing.name: null
213
+ Smoothing.prob: 1
214
+ Smoothing.window_length: !!python/tuple
215
+ - choice
216
+ - - 8
217
+ - 16
218
+ - 32
219
+ - 64
220
+ - 128
221
+ - 256
222
+ - 512
223
+ Smoothing.window_type: !!python/tuple
224
+ - const
225
+ - average
226
+
227
+ SpectralDenoising.denoise_amount: !!python/tuple
228
+ - uniform
229
+ - 0.8
230
+ - 1.0
231
+ SpectralDenoising.eq_amount: !!python/tuple
232
+ - const
233
+ - 1.0
234
+ SpectralDenoising.n_bands: 6
235
+ SpectralDenoising.n_freq: 3
236
+ SpectralDenoising.n_time: 5
237
+ SpectralDenoising.name: null
238
+ SpectralDenoising.nz_volume: -40
239
+ SpectralDenoising.prob: 1
240
+
241
+ TimeMask.name: null
242
+ TimeMask.prob: 1
243
+ TimeMask.t_center: !!python/tuple
244
+ - uniform
245
+ - 0.0
246
+ - 1.0
247
+ TimeMask.t_width: !!python/tuple
248
+ - const
249
+ - 0.025
250
+
251
+ TimeNoise.name: null
252
+ TimeNoise.prob: 1
253
+ TimeNoise.t_center: !!python/tuple
254
+ - uniform
255
+ - 0.0
256
+ - 1.0
257
+ TimeNoise.t_width: !!python/tuple
258
+ - const
259
+ - 0.025
260
+
261
+ VampNet.dropout: 0.1
262
+ VampNet.embedding_dim: 1280
263
+ VampNet.flash_attn: false
264
+ VampNet.latent_dim: 8
265
+ VampNet.n_codebooks: 4
266
+ VampNet.n_conditioning_codebooks: 0
267
+ VampNet.n_heads: 20
268
+ VampNet.n_layers: 20
269
+ VampNet.noise_mode: mask
270
+ VampNet.r_cond_dim: 0
271
+ VampNet.vocab_size: 1024
272
+
273
+ VolumeChange.db: !!python/tuple
274
+ - uniform
275
+ - -12.0
276
+ - 0.0
277
+ VolumeChange.name: null
278
+ VolumeChange.prob: 1.0
279
+
280
+ VolumeNorm.db: !!python/tuple
281
+ - const
282
+ - -24
283
+ VolumeNorm.name: null
284
+ VolumeNorm.prob: 1.0
285
+
286
+ amp: false
287
+
288
+ args.debug: true
289
+ args.load: conf/generated/knower/coarse.yml
290
+ args.save: null
291
+
292
+ batch_size: 6
293
+
294
+ codec_ckpt: ./models/vampnet/codec.pth
295
+
296
+ fine_tune: true
297
+
298
+ fine_tune_checkpoint: ./models/vampnet/coarse.pth
299
+
300
+ grad_clip_val: 5.0
301
+
302
+ num_iters: 500000
303
+
304
+ num_workers: 7
305
+
306
+ resume: true
307
+
308
+ sample_freq: 1000
309
+
310
+ save_iters:
311
+ - 10000
312
+ - 20000
313
+ - 30000
314
+ - 40000
315
+ - 50000
316
+
317
+ save_path: ./runs/knower/coarse
318
+
319
+ seed: 0
320
+
321
+ tag: latest
322
+
323
+ train/AudioDataset.aligned: false
324
+ train/AudioDataset.duration: 10.0
325
+ train/AudioDataset.loudness_cutoff: -30.0
326
+ train/AudioDataset.n_examples: 100000000
327
+ train/AudioDataset.num_channels: 1
328
+ train/AudioDataset.offset: null
329
+ train/AudioDataset.shuffle_loaders: false
330
+ train/AudioDataset.without_replacement: false
331
+
332
+ train/AudioLoader.sources:
333
+ - /media/CHONK/hugo/knower
334
+
335
+ train/BackgroundNoise.eq_amount: !!python/tuple
336
+ - const
337
+ - 1.0
338
+ train/BackgroundNoise.loudness_cutoff: null
339
+ train/BackgroundNoise.n_bands: 3
340
+ train/BackgroundNoise.name: null
341
+ train/BackgroundNoise.prob: 1.0
342
+ train/BackgroundNoise.snr: !!python/tuple
343
+ - uniform
344
+ - 10.0
345
+ - 30.0
346
+ train/BackgroundNoise.sources: null
347
+ train/BackgroundNoise.weights: null
348
+
349
+ train/BaseTransform.keys: []
350
+ train/BaseTransform.name: null
351
+ train/BaseTransform.prob: 1.0
352
+
353
+ train/ClippingDistortion.name: null
354
+ train/ClippingDistortion.perc: !!python/tuple
355
+ - uniform
356
+ - 0.0
357
+ - 0.1
358
+ train/ClippingDistortion.prob: 1.0
359
+
360
+ train/CorruptPhase.name: null
361
+ train/CorruptPhase.prob: 1
362
+ train/CorruptPhase.scale: !!python/tuple
363
+ - uniform
364
+ - 0
365
+ - 3.141592653589793
366
+
367
+ train/CrossTalk.loudness_cutoff: -40
368
+ train/CrossTalk.name: null
369
+ train/CrossTalk.prob: 1.0
370
+ train/CrossTalk.snr: !!python/tuple
371
+ - uniform
372
+ - 0.0
373
+ - 10.0
374
+ train/CrossTalk.sources: null
375
+ train/CrossTalk.weights: null
376
+
377
+ train/Equalizer.eq_amount: !!python/tuple
378
+ - const
379
+ - 1.0
380
+ train/Equalizer.n_bands: 6
381
+ train/Equalizer.name: null
382
+ train/Equalizer.prob: 1.0
383
+
384
+ train/FrequencyMask.f_center: !!python/tuple
385
+ - uniform
386
+ - 0.0
387
+ - 1.0
388
+ train/FrequencyMask.f_width: !!python/tuple
389
+ - const
390
+ - 0.1
391
+ train/FrequencyMask.name: null
392
+ train/FrequencyMask.prob: 1
393
+
394
+ train/FrequencyNoise.f_center: !!python/tuple
395
+ - uniform
396
+ - 0.0
397
+ - 1.0
398
+ train/FrequencyNoise.f_width: !!python/tuple
399
+ - const
400
+ - 0.1
401
+ train/FrequencyNoise.name: null
402
+ train/FrequencyNoise.prob: 1
403
+
404
+ train/GlobalVolumeNorm.db: !!python/tuple
405
+ - const
406
+ - -24
407
+ train/GlobalVolumeNorm.name: null
408
+ train/GlobalVolumeNorm.prob: 1.0
409
+
410
+ train/HighPass.cutoff: !!python/tuple
411
+ - choice
412
+ - - 50
413
+ - 100
414
+ - 250
415
+ - 500
416
+ - 1000
417
+ train/HighPass.name: null
418
+ train/HighPass.prob: 1
419
+ train/HighPass.zeros: 51
420
+
421
+ train/InvertPhase.name: null
422
+ train/InvertPhase.prob: 1
423
+
424
+ train/LowPass.cutoff: !!python/tuple
425
+ - choice
426
+ - - 4000
427
+ - 8000
428
+ - 16000
429
+ train/LowPass.name: null
430
+ train/LowPass.prob: 1
431
+ train/LowPass.zeros: 51
432
+
433
+ train/MaskLowMagnitudes.db_cutoff: !!python/tuple
434
+ - uniform
435
+ - -10
436
+ - 10
437
+ train/MaskLowMagnitudes.name: null
438
+ train/MaskLowMagnitudes.prob: 1
439
+
440
+ train/MuLawQuantization.channels: !!python/tuple
441
+ - choice
442
+ - - 8
443
+ - 32
444
+ - 128
445
+ - 256
446
+ - 1024
447
+ train/MuLawQuantization.name: null
448
+ train/MuLawQuantization.prob: 1.0
449
+
450
+ train/NoiseFloor.db: !!python/tuple
451
+ - const
452
+ - -50.0
453
+ train/NoiseFloor.name: null
454
+ train/NoiseFloor.prob: 1.0
455
+
456
+ train/Quantization.channels: !!python/tuple
457
+ - choice
458
+ - - 8
459
+ - 32
460
+ - 128
461
+ - 256
462
+ - 1024
463
+ train/Quantization.name: null
464
+ train/Quantization.prob: 1.0
465
+
466
+ train/Repeat.n_repeat: 1
467
+ train/Repeat.name: null
468
+ train/Repeat.prob: 1.0
469
+
470
+ train/RepeatUpTo.max_repeat: 5
471
+ train/RepeatUpTo.name: null
472
+ train/RepeatUpTo.prob: 1.0
473
+ train/RepeatUpTo.weights: null
474
+
475
+ train/RescaleAudio.name: null
476
+ train/RescaleAudio.prob: 1
477
+ train/RescaleAudio.val: 1.0
478
+
479
+ train/RoomImpulseResponse.drr: !!python/tuple
480
+ - uniform
481
+ - 0.0
482
+ - 30.0
483
+ train/RoomImpulseResponse.duration: 1.0
484
+ train/RoomImpulseResponse.eq_amount: !!python/tuple
485
+ - const
486
+ - 1.0
487
+ train/RoomImpulseResponse.n_bands: 6
488
+ train/RoomImpulseResponse.name: null
489
+ train/RoomImpulseResponse.offset: 0.0
490
+ train/RoomImpulseResponse.prob: 1.0
491
+ train/RoomImpulseResponse.sources: null
492
+ train/RoomImpulseResponse.use_original_phase: false
493
+ train/RoomImpulseResponse.weights: null
494
+
495
+ train/ShiftPhase.name: null
496
+ train/ShiftPhase.prob: 1
497
+ train/ShiftPhase.shift: !!python/tuple
498
+ - uniform
499
+ - -3.141592653589793
500
+ - 3.141592653589793
501
+
502
+ train/Silence.name: null
503
+ train/Silence.prob: 0.1
504
+
505
+ train/Smoothing.name: null
506
+ train/Smoothing.prob: 1
507
+ train/Smoothing.window_length: !!python/tuple
508
+ - choice
509
+ - - 8
510
+ - 16
511
+ - 32
512
+ - 64
513
+ - 128
514
+ - 256
515
+ - 512
516
+ train/Smoothing.window_type: !!python/tuple
517
+ - const
518
+ - average
519
+
520
+ train/SpectralDenoising.denoise_amount: !!python/tuple
521
+ - uniform
522
+ - 0.8
523
+ - 1.0
524
+ train/SpectralDenoising.eq_amount: !!python/tuple
525
+ - const
526
+ - 1.0
527
+ train/SpectralDenoising.n_bands: 6
528
+ train/SpectralDenoising.n_freq: 3
529
+ train/SpectralDenoising.n_time: 5
530
+ train/SpectralDenoising.name: null
531
+ train/SpectralDenoising.nz_volume: -40
532
+ train/SpectralDenoising.prob: 1
533
+
534
+ train/TimeMask.name: null
535
+ train/TimeMask.prob: 1
536
+ train/TimeMask.t_center: !!python/tuple
537
+ - uniform
538
+ - 0.0
539
+ - 1.0
540
+ train/TimeMask.t_width: !!python/tuple
541
+ - const
542
+ - 0.025
543
+
544
+ train/TimeNoise.name: null
545
+ train/TimeNoise.prob: 1
546
+ train/TimeNoise.t_center: !!python/tuple
547
+ - uniform
548
+ - 0.0
549
+ - 1.0
550
+ train/TimeNoise.t_width: !!python/tuple
551
+ - const
552
+ - 0.025
553
+
554
+ train/VolumeChange.db: !!python/tuple
555
+ - uniform
556
+ - -12.0
557
+ - 0.0
558
+ train/VolumeChange.name: null
559
+ train/VolumeChange.prob: 1.0
560
+
561
+ train/VolumeNorm.db: !!python/tuple
562
+ - const
563
+ - -24
564
+ train/VolumeNorm.name: null
565
+ train/VolumeNorm.prob: 1.0
566
+
567
+ val/AudioDataset.aligned: false
568
+ val/AudioDataset.duration: 10.0
569
+ val/AudioDataset.loudness_cutoff: -30.0
570
+ val/AudioDataset.n_examples: 500
571
+ val/AudioDataset.num_channels: 1
572
+ val/AudioDataset.offset: null
573
+ val/AudioDataset.shuffle_loaders: false
574
+ val/AudioDataset.without_replacement: false
575
+
576
+ val/AudioLoader.sources:
577
+ - /media/CHONK/hugo/knower
578
+
579
+ val/BackgroundNoise.eq_amount: !!python/tuple
580
+ - const
581
+ - 1.0
582
+ val/BackgroundNoise.loudness_cutoff: null
583
+ val/BackgroundNoise.n_bands: 3
584
+ val/BackgroundNoise.name: null
585
+ val/BackgroundNoise.prob: 1.0
586
+ val/BackgroundNoise.snr: !!python/tuple
587
+ - uniform
588
+ - 10.0
589
+ - 30.0
590
+ val/BackgroundNoise.sources: null
591
+ val/BackgroundNoise.weights: null
592
+
593
+ val/BaseTransform.keys: []
594
+ val/BaseTransform.name: null
595
+ val/BaseTransform.prob: 1.0
596
+
597
+ val/ClippingDistortion.name: null
598
+ val/ClippingDistortion.perc: !!python/tuple
599
+ - uniform
600
+ - 0.0
601
+ - 0.1
602
+ val/ClippingDistortion.prob: 1.0
603
+
604
+ val/CorruptPhase.name: null
605
+ val/CorruptPhase.prob: 1
606
+ val/CorruptPhase.scale: !!python/tuple
607
+ - uniform
608
+ - 0
609
+ - 3.141592653589793
610
+
611
+ val/CrossTalk.loudness_cutoff: -40
612
+ val/CrossTalk.name: null
613
+ val/CrossTalk.prob: 1.0
614
+ val/CrossTalk.snr: !!python/tuple
615
+ - uniform
616
+ - 0.0
617
+ - 10.0
618
+ val/CrossTalk.sources: null
619
+ val/CrossTalk.weights: null
620
+
621
+ val/Equalizer.eq_amount: !!python/tuple
622
+ - const
623
+ - 1.0
624
+ val/Equalizer.n_bands: 6
625
+ val/Equalizer.name: null
626
+ val/Equalizer.prob: 1.0
627
+
628
+ val/FrequencyMask.f_center: !!python/tuple
629
+ - uniform
630
+ - 0.0
631
+ - 1.0
632
+ val/FrequencyMask.f_width: !!python/tuple
633
+ - const
634
+ - 0.1
635
+ val/FrequencyMask.name: null
636
+ val/FrequencyMask.prob: 1
637
+
638
+ val/FrequencyNoise.f_center: !!python/tuple
639
+ - uniform
640
+ - 0.0
641
+ - 1.0
642
+ val/FrequencyNoise.f_width: !!python/tuple
643
+ - const
644
+ - 0.1
645
+ val/FrequencyNoise.name: null
646
+ val/FrequencyNoise.prob: 1
647
+
648
+ val/GlobalVolumeNorm.db: !!python/tuple
649
+ - const
650
+ - -24
651
+ val/GlobalVolumeNorm.name: null
652
+ val/GlobalVolumeNorm.prob: 1.0
653
+
654
+ val/HighPass.cutoff: !!python/tuple
655
+ - choice
656
+ - - 50
657
+ - 100
658
+ - 250
659
+ - 500
660
+ - 1000
661
+ val/HighPass.name: null
662
+ val/HighPass.prob: 1
663
+ val/HighPass.zeros: 51
664
+
665
+ val/InvertPhase.name: null
666
+ val/InvertPhase.prob: 1
667
+
668
+ val/LowPass.cutoff: !!python/tuple
669
+ - choice
670
+ - - 4000
671
+ - 8000
672
+ - 16000
673
+ val/LowPass.name: null
674
+ val/LowPass.prob: 1
675
+ val/LowPass.zeros: 51
676
+
677
+ val/MaskLowMagnitudes.db_cutoff: !!python/tuple
678
+ - uniform
679
+ - -10
680
+ - 10
681
+ val/MaskLowMagnitudes.name: null
682
+ val/MaskLowMagnitudes.prob: 1
683
+
684
+ val/MuLawQuantization.channels: !!python/tuple
685
+ - choice
686
+ - - 8
687
+ - 32
688
+ - 128
689
+ - 256
690
+ - 1024
691
+ val/MuLawQuantization.name: null
692
+ val/MuLawQuantization.prob: 1.0
693
+
694
+ val/NoiseFloor.db: !!python/tuple
695
+ - const
696
+ - -50.0
697
+ val/NoiseFloor.name: null
698
+ val/NoiseFloor.prob: 1.0
699
+
700
+ val/Quantization.channels: !!python/tuple
701
+ - choice
702
+ - - 8
703
+ - 32
704
+ - 128
705
+ - 256
706
+ - 1024
707
+ val/Quantization.name: null
708
+ val/Quantization.prob: 1.0
709
+
710
+ val/Repeat.n_repeat: 1
711
+ val/Repeat.name: null
712
+ val/Repeat.prob: 1.0
713
+
714
+ val/RepeatUpTo.max_repeat: 5
715
+ val/RepeatUpTo.name: null
716
+ val/RepeatUpTo.prob: 1.0
717
+ val/RepeatUpTo.weights: null
718
+
719
+ val/RescaleAudio.name: null
720
+ val/RescaleAudio.prob: 1
721
+ val/RescaleAudio.val: 1.0
722
+
723
+ val/RoomImpulseResponse.drr: !!python/tuple
724
+ - uniform
725
+ - 0.0
726
+ - 30.0
727
+ val/RoomImpulseResponse.duration: 1.0
728
+ val/RoomImpulseResponse.eq_amount: !!python/tuple
729
+ - const
730
+ - 1.0
731
+ val/RoomImpulseResponse.n_bands: 6
732
+ val/RoomImpulseResponse.name: null
733
+ val/RoomImpulseResponse.offset: 0.0
734
+ val/RoomImpulseResponse.prob: 1.0
735
+ val/RoomImpulseResponse.sources: null
736
+ val/RoomImpulseResponse.use_original_phase: false
737
+ val/RoomImpulseResponse.weights: null
738
+
739
+ val/ShiftPhase.name: null
740
+ val/ShiftPhase.prob: 1
741
+ val/ShiftPhase.shift: !!python/tuple
742
+ - uniform
743
+ - -3.141592653589793
744
+ - 3.141592653589793
745
+
746
+ val/Silence.name: null
747
+ val/Silence.prob: 0.1
748
+
749
+ val/Smoothing.name: null
750
+ val/Smoothing.prob: 1
751
+ val/Smoothing.window_length: !!python/tuple
752
+ - choice
753
+ - - 8
754
+ - 16
755
+ - 32
756
+ - 64
757
+ - 128
758
+ - 256
759
+ - 512
760
+ val/Smoothing.window_type: !!python/tuple
761
+ - const
762
+ - average
763
+
764
+ val/SpectralDenoising.denoise_amount: !!python/tuple
765
+ - uniform
766
+ - 0.8
767
+ - 1.0
768
+ val/SpectralDenoising.eq_amount: !!python/tuple
769
+ - const
770
+ - 1.0
771
+ val/SpectralDenoising.n_bands: 6
772
+ val/SpectralDenoising.n_freq: 3
773
+ val/SpectralDenoising.n_time: 5
774
+ val/SpectralDenoising.name: null
775
+ val/SpectralDenoising.nz_volume: -40
776
+ val/SpectralDenoising.prob: 1
777
+
778
+ val/TimeMask.name: null
779
+ val/TimeMask.prob: 1
780
+ val/TimeMask.t_center: !!python/tuple
781
+ - uniform
782
+ - 0.0
783
+ - 1.0
784
+ val/TimeMask.t_width: !!python/tuple
785
+ - const
786
+ - 0.025
787
+
788
+ val/TimeNoise.name: null
789
+ val/TimeNoise.prob: 1
790
+ val/TimeNoise.t_center: !!python/tuple
791
+ - uniform
792
+ - 0.0
793
+ - 1.0
794
+ val/TimeNoise.t_width: !!python/tuple
795
+ - const
796
+ - 0.025
797
+
798
+ val/VolumeChange.db: !!python/tuple
799
+ - uniform
800
+ - -12.0
801
+ - 0.0
802
+ val/VolumeChange.name: null
803
+ val/VolumeChange.prob: 1.0
804
+
805
+ val/VolumeNorm.db: !!python/tuple
806
+ - const
807
+ - -24
808
+ val/VolumeNorm.name: null
809
+ val/VolumeNorm.prob: 1.0
810
+
811
+ val_freq: 500
812
+
813
+ val_idx:
814
+ - 0
815
+ - 1
816
+ - 2
817
+ - 3
818
+ - 4
819
+ - 5
820
+ - 6
821
+ - 7
822
+ - 8
823
+ - 9
824
+
runs/knower/coarse/best/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cdf46139e0a9b6ff93f954f037a05f8dfcd574180ed1732d61abbe3c75c696b4
3
+ size 1343718241
runs/knower/coarse/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e11462551537ffe62fd3c579473ffe5da73d0149d9a956d8e3448ada9a8b85c0
3
+ size 1343718241
runs/knower/coarse/model.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ OptimizedModule(
2
+ 335.894M params.
3
+ (_orig_mod): VampNet(
4
+ 335.894M params.
5
+ (embedding): CodebookEmbedding(
6
+ 0.042M params.
7
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
8
+ (out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
9
+ )
10
+ (transformer): TransformerStack(
11
+ 330.600M params.
12
+ (layers): ModuleList(
13
+ (0): TransformerLayer(
14
+ 16.531M params.
15
+ (norm_1): RMSNorm( 0.001M params.)
16
+ (film_1): FiLM( 0.000M params.)
17
+ (self_attn): MultiHeadRelativeAttention(
18
+ 6.616M params.
19
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
21
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
22
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
23
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
24
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
25
+ )
26
+ (norm_3): RMSNorm( 0.001M params.)
27
+ (film_3): FiLM( 0.000M params.)
28
+ (feed_forward): FeedForward(
29
+ 9.912M params.
30
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
31
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
32
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
33
+ (act): GatedGELU(
34
+ 0.000M params.
35
+ (gelu): NewGELU( 0.000M params.)
36
+ )
37
+ )
38
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
39
+ )
40
+ (1-19): 19 x TransformerLayer(
41
+ 16.530M params.
42
+ (norm_1): RMSNorm( 0.001M params.)
43
+ (film_1): FiLM( 0.000M params.)
44
+ (self_attn): MultiHeadRelativeAttention(
45
+ 6.615M params.
46
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
48
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
49
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
50
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
51
+ )
52
+ (norm_3): RMSNorm( 0.001M params.)
53
+ (film_3): FiLM( 0.000M params.)
54
+ (feed_forward): FeedForward(
55
+ 9.912M params.
56
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
57
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
58
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
59
+ (act): GatedGELU(
60
+ 0.000M params.
61
+ (gelu): NewGELU( 0.000M params.)
62
+ )
63
+ )
64
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
65
+ )
66
+ )
67
+ (norm): RMSNorm( 0.001M params.)
68
+ )
69
+ (classifier): SequentialWithFiLM(
70
+ 5.251M params.
71
+ (layers): ModuleList(
72
+ (0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
73
+ )
74
+ )
75
+ )
76
+ )
runs/n64/c2f/args.yml ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 3.0
14
+ AudioDataset.loudness_cutoff: -40.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: ''
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ CrossEntropyLoss.ignore_index: -100
33
+ CrossEntropyLoss.label_smoothing: 0.1
34
+ CrossEntropyLoss.reduce: null
35
+ CrossEntropyLoss.reduction: mean
36
+ CrossEntropyLoss.size_average: null
37
+
38
+ NoamScheduler.d_model: 512
39
+ NoamScheduler.factor: 2.0
40
+ NoamScheduler.warmup: 500
41
+
42
+ VampNet.dropout: 0.1
43
+ VampNet.embedding_dim: 1280
44
+ VampNet.flash_attn: false
45
+ VampNet.latent_dim: 8
46
+ VampNet.n_codebooks: 14
47
+ VampNet.n_conditioning_codebooks: 4
48
+ VampNet.n_heads: 20
49
+ VampNet.n_layers: 16
50
+ VampNet.noise_mode: mask
51
+ VampNet.r_cond_dim: 0
52
+ VampNet.vocab_size: 1024
53
+
54
+ amp: false
55
+
56
+ args.debug: true
57
+ args.load: conf/generated/n64/c2f.yml
58
+ args.save: null
59
+
60
+ batch_size: 6
61
+
62
+ codec_ckpt: ./models/vampnet/codec.pth
63
+
64
+ fine_tune: true
65
+
66
+ fine_tune_checkpoint: ./models/vampnet/c2f.pth
67
+
68
+ grad_clip_val: 5.0
69
+
70
+ num_iters: 500000
71
+
72
+ num_workers: 7
73
+
74
+ resume: false
75
+
76
+ sample_freq: 2000
77
+
78
+ save_iters:
79
+ - 2000
80
+ - 4000
81
+ - 10000
82
+ - 20000
83
+ - 40000
84
+ - 100000
85
+
86
+ save_path: ./runs/n64/c2f
87
+
88
+ seed: 0
89
+
90
+ tag: latest
91
+
92
+ train/AudioDataset.aligned: false
93
+ train/AudioDataset.duration: 3.0
94
+ train/AudioDataset.loudness_cutoff: -40.0
95
+ train/AudioDataset.n_examples: 100000000
96
+ train/AudioDataset.num_channels: 1
97
+ train/AudioDataset.offset: null
98
+ train/AudioDataset.shuffle_loaders: false
99
+ train/AudioDataset.without_replacement: false
100
+
101
+ train/AudioLoader.sources:
102
+ - data/salad-bowl/n64-jungle/n64-jungle-mix.wav
103
+
104
+ val/AudioDataset.aligned: false
105
+ val/AudioDataset.duration: 3.0
106
+ val/AudioDataset.loudness_cutoff: -40.0
107
+ val/AudioDataset.n_examples: 500
108
+ val/AudioDataset.num_channels: 1
109
+ val/AudioDataset.offset: null
110
+ val/AudioDataset.shuffle_loaders: false
111
+ val/AudioDataset.without_replacement: false
112
+
113
+ val/AudioLoader.sources:
114
+ - data/salad-bowl/n64-jungle/n64-jungle-mix.wav
115
+
116
+ val_freq: 1000
117
+
118
+ val_idx:
119
+ - 0
120
+ - 1
121
+ - 2
122
+ - 3
123
+ - 4
124
+ - 5
125
+ - 6
126
+ - 7
127
+ - 8
128
+ - 9
129
+
runs/n64/c2f/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6af65912cdf28c67af5a6bb146270f2f6e3a66f8ef831d6387b282796099eb9e
3
+ size 1111127537
runs/n64/c2f/model.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ OptimizedModule(
2
+ 277.753M params.
3
+ (_orig_mod): VampNet(
4
+ 277.753M params.
5
+ (embedding): CodebookEmbedding(
6
+ 0.145M params.
7
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
8
+ (out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
9
+ )
10
+ (transformer): TransformerStack(
11
+ 264.481M params.
12
+ (layers): ModuleList(
13
+ (0): TransformerLayer(
14
+ 16.531M params.
15
+ (norm_1): RMSNorm( 0.001M params.)
16
+ (film_1): FiLM( 0.000M params.)
17
+ (self_attn): MultiHeadRelativeAttention(
18
+ 6.616M params.
19
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
21
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
22
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
23
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
24
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
25
+ )
26
+ (norm_3): RMSNorm( 0.001M params.)
27
+ (film_3): FiLM( 0.000M params.)
28
+ (feed_forward): FeedForward(
29
+ 9.912M params.
30
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
31
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
32
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
33
+ (act): GatedGELU(
34
+ 0.000M params.
35
+ (gelu): NewGELU( 0.000M params.)
36
+ )
37
+ )
38
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
39
+ )
40
+ (1-15): 15 x TransformerLayer(
41
+ 16.530M params.
42
+ (norm_1): RMSNorm( 0.001M params.)
43
+ (film_1): FiLM( 0.000M params.)
44
+ (self_attn): MultiHeadRelativeAttention(
45
+ 6.615M params.
46
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
48
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
49
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
50
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
51
+ )
52
+ (norm_3): RMSNorm( 0.001M params.)
53
+ (film_3): FiLM( 0.000M params.)
54
+ (feed_forward): FeedForward(
55
+ 9.912M params.
56
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
57
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
58
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
59
+ (act): GatedGELU(
60
+ 0.000M params.
61
+ (gelu): NewGELU( 0.000M params.)
62
+ )
63
+ )
64
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
65
+ )
66
+ )
67
+ (norm): RMSNorm( 0.001M params.)
68
+ )
69
+ (classifier): SequentialWithFiLM(
70
+ 13.128M params.
71
+ (layers): ModuleList(
72
+ (0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
73
+ )
74
+ )
75
+ )
76
+ )
runs/n64/coarse/args.yml ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 10.0
14
+ AudioDataset.loudness_cutoff: -30.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: ''
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ CrossEntropyLoss.ignore_index: -100
33
+ CrossEntropyLoss.label_smoothing: 0.1
34
+ CrossEntropyLoss.reduce: null
35
+ CrossEntropyLoss.reduction: mean
36
+ CrossEntropyLoss.size_average: null
37
+
38
+ NoamScheduler.d_model: 512
39
+ NoamScheduler.factor: 2.0
40
+ NoamScheduler.warmup: 500
41
+
42
+ VampNet.dropout: 0.1
43
+ VampNet.embedding_dim: 1280
44
+ VampNet.flash_attn: false
45
+ VampNet.latent_dim: 8
46
+ VampNet.n_codebooks: 4
47
+ VampNet.n_conditioning_codebooks: 0
48
+ VampNet.n_heads: 20
49
+ VampNet.n_layers: 20
50
+ VampNet.noise_mode: mask
51
+ VampNet.r_cond_dim: 0
52
+ VampNet.vocab_size: 1024
53
+
54
+ amp: false
55
+
56
+ args.debug: true
57
+ args.load: conf/generated/n64/coarse.yml
58
+ args.save: null
59
+
60
+ batch_size: 6
61
+
62
+ codec_ckpt: ./models/vampnet/codec.pth
63
+
64
+ fine_tune: true
65
+
66
+ fine_tune_checkpoint: ./models/vampnet/coarse.pth
67
+
68
+ grad_clip_val: 5.0
69
+
70
+ num_iters: 500000
71
+
72
+ num_workers: 7
73
+
74
+ resume: false
75
+
76
+ sample_freq: 2000
77
+
78
+ save_iters:
79
+ - 2000
80
+ - 4000
81
+ - 10000
82
+ - 20000
83
+ - 40000
84
+ - 100000
85
+
86
+ save_path: ./runs/n64/coarse
87
+
88
+ seed: 0
89
+
90
+ tag: latest
91
+
92
+ train/AudioDataset.aligned: false
93
+ train/AudioDataset.duration: 10.0
94
+ train/AudioDataset.loudness_cutoff: -30.0
95
+ train/AudioDataset.n_examples: 100000000
96
+ train/AudioDataset.num_channels: 1
97
+ train/AudioDataset.offset: null
98
+ train/AudioDataset.shuffle_loaders: false
99
+ train/AudioDataset.without_replacement: false
100
+
101
+ train/AudioLoader.sources:
102
+ - data/salad-bowl/n64-jungle/n64-jungle-mix.wav
103
+
104
+ val/AudioDataset.aligned: false
105
+ val/AudioDataset.duration: 10.0
106
+ val/AudioDataset.loudness_cutoff: -30.0
107
+ val/AudioDataset.n_examples: 500
108
+ val/AudioDataset.num_channels: 1
109
+ val/AudioDataset.offset: null
110
+ val/AudioDataset.shuffle_loaders: false
111
+ val/AudioDataset.without_replacement: false
112
+
113
+ val/AudioLoader.sources:
114
+ - data/salad-bowl/n64-jungle/n64-jungle-mix.wav
115
+
116
+ val_freq: 1000
117
+
118
+ val_idx:
119
+ - 0
120
+ - 1
121
+ - 2
122
+ - 3
123
+ - 4
124
+ - 5
125
+ - 6
126
+ - 7
127
+ - 8
128
+ - 9
129
+
runs/n64/coarse/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d2d95c5ac4b80d62cffaf6e054f47b16fdef156ef567db6a6499faf801e67ab
3
+ size 1343718241
runs/n64/coarse/model.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ OptimizedModule(
2
+ 335.894M params.
3
+ (_orig_mod): VampNet(
4
+ 335.894M params.
5
+ (embedding): CodebookEmbedding(
6
+ 0.042M params.
7
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
8
+ (out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
9
+ )
10
+ (transformer): TransformerStack(
11
+ 330.600M params.
12
+ (layers): ModuleList(
13
+ (0): TransformerLayer(
14
+ 16.531M params.
15
+ (norm_1): RMSNorm( 0.001M params.)
16
+ (film_1): FiLM( 0.000M params.)
17
+ (self_attn): MultiHeadRelativeAttention(
18
+ 6.616M params.
19
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
21
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
22
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
23
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
24
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
25
+ )
26
+ (norm_3): RMSNorm( 0.001M params.)
27
+ (film_3): FiLM( 0.000M params.)
28
+ (feed_forward): FeedForward(
29
+ 9.912M params.
30
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
31
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
32
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
33
+ (act): GatedGELU(
34
+ 0.000M params.
35
+ (gelu): NewGELU( 0.000M params.)
36
+ )
37
+ )
38
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
39
+ )
40
+ (1-19): 19 x TransformerLayer(
41
+ 16.530M params.
42
+ (norm_1): RMSNorm( 0.001M params.)
43
+ (film_1): FiLM( 0.000M params.)
44
+ (self_attn): MultiHeadRelativeAttention(
45
+ 6.615M params.
46
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
48
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
49
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
50
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
51
+ )
52
+ (norm_3): RMSNorm( 0.001M params.)
53
+ (film_3): FiLM( 0.000M params.)
54
+ (feed_forward): FeedForward(
55
+ 9.912M params.
56
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
57
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
58
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
59
+ (act): GatedGELU(
60
+ 0.000M params.
61
+ (gelu): NewGELU( 0.000M params.)
62
+ )
63
+ )
64
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
65
+ )
66
+ )
67
+ (norm): RMSNorm( 0.001M params.)
68
+ )
69
+ (classifier): SequentialWithFiLM(
70
+ 5.251M params.
71
+ (layers): ModuleList(
72
+ (0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
73
+ )
74
+ )
75
+ )
76
+ )
runs/n64/n64/c2f/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6af65912cdf28c67af5a6bb146270f2f6e3a66f8ef831d6387b282796099eb9e
3
+ size 1111127537
runs/n64/n64/coarse/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d2d95c5ac4b80d62cffaf6e054f47b16fdef156ef567db6a6499faf801e67ab
3
+ size 1343718241
runs/opera/coarse/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7cc5874ba4b168b002ea4219b75552cdacef27a7d1077c025bf7b197e464b1ba
3
+ size 1343718241
runs/orchestral/c2f/args.yml ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 3.0
14
+ AudioDataset.loudness_cutoff: -40.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: ''
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ CrossEntropyLoss.ignore_index: -100
33
+ CrossEntropyLoss.label_smoothing: 0.1
34
+ CrossEntropyLoss.reduce: null
35
+ CrossEntropyLoss.reduction: mean
36
+ CrossEntropyLoss.size_average: null
37
+
38
+ NoamScheduler.d_model: 512
39
+ NoamScheduler.factor: 2.0
40
+ NoamScheduler.warmup: 500
41
+
42
+ VampNet.dropout: 0.1
43
+ VampNet.embedding_dim: 1280
44
+ VampNet.flash_attn: false
45
+ VampNet.latent_dim: 8
46
+ VampNet.n_codebooks: 14
47
+ VampNet.n_conditioning_codebooks: 4
48
+ VampNet.n_heads: 20
49
+ VampNet.n_layers: 16
50
+ VampNet.noise_mode: mask
51
+ VampNet.r_cond_dim: 0
52
+ VampNet.vocab_size: 1024
53
+
54
+ amp: false
55
+
56
+ args.debug: true
57
+ args.load: conf/generated/orchestral/c2f.yml
58
+ args.save: null
59
+
60
+ batch_size: 6
61
+
62
+ codec_ckpt: ./models/vampnet/codec.pth
63
+
64
+ fine_tune: true
65
+
66
+ fine_tune_checkpoint: ./models/vampnet/c2f.pth
67
+
68
+ grad_clip_val: 5.0
69
+
70
+ num_iters: 500000
71
+
72
+ num_workers: 7
73
+
74
+ resume: false
75
+
76
+ sample_freq: 2000
77
+
78
+ save_iters:
79
+ - 2000
80
+ - 4000
81
+ - 10000
82
+ - 20000
83
+ - 40000
84
+ - 100000
85
+
86
+ save_path: ./runs/orchestral/c2f
87
+
88
+ seed: 0
89
+
90
+ tag: latest
91
+
92
+ train/AudioDataset.aligned: false
93
+ train/AudioDataset.duration: 3.0
94
+ train/AudioDataset.loudness_cutoff: -40.0
95
+ train/AudioDataset.n_examples: 100000000
96
+ train/AudioDataset.num_channels: 1
97
+ train/AudioDataset.offset: null
98
+ train/AudioDataset.shuffle_loaders: false
99
+ train/AudioDataset.without_replacement: false
100
+
101
+ train/AudioLoader.sources:
102
+ - /media/CHONK/hugo/loras/salad-bowl/chicago-symphony-orchestra/
103
+
104
+ val/AudioDataset.aligned: false
105
+ val/AudioDataset.duration: 3.0
106
+ val/AudioDataset.loudness_cutoff: -40.0
107
+ val/AudioDataset.n_examples: 500
108
+ val/AudioDataset.num_channels: 1
109
+ val/AudioDataset.offset: null
110
+ val/AudioDataset.shuffle_loaders: false
111
+ val/AudioDataset.without_replacement: false
112
+
113
+ val/AudioLoader.sources:
114
+ - /media/CHONK/hugo/loras/salad-bowl/chicago-symphony-orchestra/
115
+
116
+ val_freq: 1000
117
+
118
+ val_idx:
119
+ - 0
120
+ - 1
121
+ - 2
122
+ - 3
123
+ - 4
124
+ - 5
125
+ - 6
126
+ - 7
127
+ - 8
128
+ - 9
129
+
runs/orchestral/c2f/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58a0e9cb777bc5a91835a48e77510d18a049295eab3ff7f23537581c6b3d390f
3
+ size 1111127537
runs/orchestral/c2f/model.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ OptimizedModule(
2
+ 277.753M params.
3
+ (_orig_mod): VampNet(
4
+ 277.753M params.
5
+ (embedding): CodebookEmbedding(
6
+ 0.145M params.
7
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
8
+ (out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
9
+ )
10
+ (transformer): TransformerStack(
11
+ 264.481M params.
12
+ (layers): ModuleList(
13
+ (0): TransformerLayer(
14
+ 16.531M params.
15
+ (norm_1): RMSNorm( 0.001M params.)
16
+ (film_1): FiLM( 0.000M params.)
17
+ (self_attn): MultiHeadRelativeAttention(
18
+ 6.616M params.
19
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
21
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
22
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
23
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
24
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
25
+ )
26
+ (norm_3): RMSNorm( 0.001M params.)
27
+ (film_3): FiLM( 0.000M params.)
28
+ (feed_forward): FeedForward(
29
+ 9.912M params.
30
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
31
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
32
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
33
+ (act): GatedGELU(
34
+ 0.000M params.
35
+ (gelu): NewGELU( 0.000M params.)
36
+ )
37
+ )
38
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
39
+ )
40
+ (1-15): 15 x TransformerLayer(
41
+ 16.530M params.
42
+ (norm_1): RMSNorm( 0.001M params.)
43
+ (film_1): FiLM( 0.000M params.)
44
+ (self_attn): MultiHeadRelativeAttention(
45
+ 6.615M params.
46
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
48
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
49
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
50
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
51
+ )
52
+ (norm_3): RMSNorm( 0.001M params.)
53
+ (film_3): FiLM( 0.000M params.)
54
+ (feed_forward): FeedForward(
55
+ 9.912M params.
56
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
57
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
58
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
59
+ (act): GatedGELU(
60
+ 0.000M params.
61
+ (gelu): NewGELU( 0.000M params.)
62
+ )
63
+ )
64
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
65
+ )
66
+ )
67
+ (norm): RMSNorm( 0.001M params.)
68
+ )
69
+ (classifier): SequentialWithFiLM(
70
+ 13.128M params.
71
+ (layers): ModuleList(
72
+ (0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
73
+ )
74
+ )
75
+ )
76
+ )
runs/orchestral/coarse/args.yml ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 10.0
14
+ AudioDataset.loudness_cutoff: -30.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: ''
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ CrossEntropyLoss.ignore_index: -100
33
+ CrossEntropyLoss.label_smoothing: 0.1
34
+ CrossEntropyLoss.reduce: null
35
+ CrossEntropyLoss.reduction: mean
36
+ CrossEntropyLoss.size_average: null
37
+
38
+ NoamScheduler.d_model: 512
39
+ NoamScheduler.factor: 2.0
40
+ NoamScheduler.warmup: 500
41
+
42
+ VampNet.dropout: 0.1
43
+ VampNet.embedding_dim: 1280
44
+ VampNet.flash_attn: false
45
+ VampNet.latent_dim: 8
46
+ VampNet.n_codebooks: 4
47
+ VampNet.n_conditioning_codebooks: 0
48
+ VampNet.n_heads: 20
49
+ VampNet.n_layers: 20
50
+ VampNet.noise_mode: mask
51
+ VampNet.r_cond_dim: 0
52
+ VampNet.vocab_size: 1024
53
+
54
+ amp: false
55
+
56
+ args.debug: true
57
+ args.load: conf/generated/orchestral/coarse.yml
58
+ args.save: null
59
+
60
+ batch_size: 6
61
+
62
+ codec_ckpt: ./models/vampnet/codec.pth
63
+
64
+ fine_tune: true
65
+
66
+ fine_tune_checkpoint: ./models/vampnet/coarse.pth
67
+
68
+ grad_clip_val: 5.0
69
+
70
+ num_iters: 500000
71
+
72
+ num_workers: 7
73
+
74
+ resume: false
75
+
76
+ sample_freq: 2000
77
+
78
+ save_iters:
79
+ - 2000
80
+ - 4000
81
+ - 10000
82
+ - 20000
83
+ - 40000
84
+ - 100000
85
+
86
+ save_path: ./runs/orchestral/coarse
87
+
88
+ seed: 0
89
+
90
+ tag: latest
91
+
92
+ train/AudioDataset.aligned: false
93
+ train/AudioDataset.duration: 10.0
94
+ train/AudioDataset.loudness_cutoff: -30.0
95
+ train/AudioDataset.n_examples: 100000000
96
+ train/AudioDataset.num_channels: 1
97
+ train/AudioDataset.offset: null
98
+ train/AudioDataset.shuffle_loaders: false
99
+ train/AudioDataset.without_replacement: false
100
+
101
+ train/AudioLoader.sources:
102
+ - /media/CHONK/hugo/loras/salad-bowl/chicago-symphony-orchestra/
103
+
104
+ val/AudioDataset.aligned: false
105
+ val/AudioDataset.duration: 10.0
106
+ val/AudioDataset.loudness_cutoff: -30.0
107
+ val/AudioDataset.n_examples: 500
108
+ val/AudioDataset.num_channels: 1
109
+ val/AudioDataset.offset: null
110
+ val/AudioDataset.shuffle_loaders: false
111
+ val/AudioDataset.without_replacement: false
112
+
113
+ val/AudioLoader.sources:
114
+ - /media/CHONK/hugo/loras/salad-bowl/chicago-symphony-orchestra/
115
+
116
+ val_freq: 1000
117
+
118
+ val_idx:
119
+ - 0
120
+ - 1
121
+ - 2
122
+ - 3
123
+ - 4
124
+ - 5
125
+ - 6
126
+ - 7
127
+ - 8
128
+ - 9
129
+
runs/orchestral/coarse/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19699c048342df79196a2f558e66038561068b0d4790080990906194652b58bf
3
+ size 1343718241
runs/orchestral/coarse/model.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ OptimizedModule(
2
+ 335.894M params.
3
+ (_orig_mod): VampNet(
4
+ 335.894M params.
5
+ (embedding): CodebookEmbedding(
6
+ 0.042M params.
7
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
8
+ (out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
9
+ )
10
+ (transformer): TransformerStack(
11
+ 330.600M params.
12
+ (layers): ModuleList(
13
+ (0): TransformerLayer(
14
+ 16.531M params.
15
+ (norm_1): RMSNorm( 0.001M params.)
16
+ (film_1): FiLM( 0.000M params.)
17
+ (self_attn): MultiHeadRelativeAttention(
18
+ 6.616M params.
19
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
21
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
22
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
23
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
24
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
25
+ )
26
+ (norm_3): RMSNorm( 0.001M params.)
27
+ (film_3): FiLM( 0.000M params.)
28
+ (feed_forward): FeedForward(
29
+ 9.912M params.
30
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
31
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
32
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
33
+ (act): GatedGELU(
34
+ 0.000M params.
35
+ (gelu): NewGELU( 0.000M params.)
36
+ )
37
+ )
38
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
39
+ )
40
+ (1-19): 19 x TransformerLayer(
41
+ 16.530M params.
42
+ (norm_1): RMSNorm( 0.001M params.)
43
+ (film_1): FiLM( 0.000M params.)
44
+ (self_attn): MultiHeadRelativeAttention(
45
+ 6.615M params.
46
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
48
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
49
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
50
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
51
+ )
52
+ (norm_3): RMSNorm( 0.001M params.)
53
+ (film_3): FiLM( 0.000M params.)
54
+ (feed_forward): FeedForward(
55
+ 9.912M params.
56
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
57
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
58
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
59
+ (act): GatedGELU(
60
+ 0.000M params.
61
+ (gelu): NewGELU( 0.000M params.)
62
+ )
63
+ )
64
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
65
+ )
66
+ )
67
+ (norm): RMSNorm( 0.001M params.)
68
+ )
69
+ (classifier): SequentialWithFiLM(
70
+ 5.251M params.
71
+ (layers): ModuleList(
72
+ (0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
73
+ )
74
+ )
75
+ )
76
+ )
runs/soundrangers-v2-v1/c2f/args.yml ADDED
@@ -0,0 +1,851 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 3.0
14
+ AudioDataset.loudness_cutoff: -40.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: ''
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ BackgroundNoise.eq_amount: !!python/tuple
33
+ - const
34
+ - 1.0
35
+ BackgroundNoise.loudness_cutoff: null
36
+ BackgroundNoise.n_bands: 3
37
+ BackgroundNoise.name: null
38
+ BackgroundNoise.prob: 1.0
39
+ BackgroundNoise.snr: !!python/tuple
40
+ - uniform
41
+ - 10.0
42
+ - 30.0
43
+ BackgroundNoise.sources: null
44
+ BackgroundNoise.weights: null
45
+
46
+ BaseTransform.keys: []
47
+ BaseTransform.name: null
48
+ BaseTransform.prob: 1.0
49
+
50
+ ClippingDistortion.name: null
51
+ ClippingDistortion.perc: !!python/tuple
52
+ - uniform
53
+ - 0.0
54
+ - 0.1
55
+ ClippingDistortion.prob: 1.0
56
+
57
+ CorruptPhase.name: null
58
+ CorruptPhase.prob: 1
59
+ CorruptPhase.scale: !!python/tuple
60
+ - uniform
61
+ - 0
62
+ - 3.141592653589793
63
+
64
+ CrossEntropyLoss.ignore_index: -100
65
+ CrossEntropyLoss.label_smoothing: 0.1
66
+ CrossEntropyLoss.reduce: null
67
+ CrossEntropyLoss.reduction: mean
68
+ CrossEntropyLoss.size_average: null
69
+
70
+ CrossTalk.loudness_cutoff: -40
71
+ CrossTalk.name: null
72
+ CrossTalk.prob: 1.0
73
+ CrossTalk.snr: !!python/tuple
74
+ - uniform
75
+ - 0.0
76
+ - 10.0
77
+ CrossTalk.sources: null
78
+ CrossTalk.weights: null
79
+
80
+ Equalizer.eq_amount: !!python/tuple
81
+ - const
82
+ - 1.0
83
+ Equalizer.n_bands: 6
84
+ Equalizer.name: null
85
+ Equalizer.prob: 1.0
86
+
87
+ FrequencyMask.f_center: !!python/tuple
88
+ - uniform
89
+ - 0.0
90
+ - 1.0
91
+ FrequencyMask.f_width: !!python/tuple
92
+ - const
93
+ - 0.1
94
+ FrequencyMask.name: null
95
+ FrequencyMask.prob: 1
96
+
97
+ FrequencyNoise.f_center: !!python/tuple
98
+ - uniform
99
+ - 0.0
100
+ - 1.0
101
+ FrequencyNoise.f_width: !!python/tuple
102
+ - const
103
+ - 0.1
104
+ FrequencyNoise.name: null
105
+ FrequencyNoise.prob: 1
106
+
107
+ GlobalVolumeNorm.db: !!python/tuple
108
+ - const
109
+ - -24
110
+ GlobalVolumeNorm.name: null
111
+ GlobalVolumeNorm.prob: 1.0
112
+
113
+ HighPass.cutoff: !!python/tuple
114
+ - choice
115
+ - - 50
116
+ - 100
117
+ - 250
118
+ - 500
119
+ - 1000
120
+ HighPass.name: null
121
+ HighPass.prob: 1
122
+ HighPass.zeros: 51
123
+
124
+ InvertPhase.name: null
125
+ InvertPhase.prob: 1
126
+
127
+ LowPass.cutoff: !!python/tuple
128
+ - choice
129
+ - - 4000
130
+ - 8000
131
+ - 16000
132
+ LowPass.name: null
133
+ LowPass.prob: 1
134
+ LowPass.zeros: 51
135
+
136
+ MaskLowMagnitudes.db_cutoff: !!python/tuple
137
+ - uniform
138
+ - -10
139
+ - 10
140
+ MaskLowMagnitudes.name: null
141
+ MaskLowMagnitudes.prob: 1
142
+
143
+ MuLawQuantization.channels: !!python/tuple
144
+ - choice
145
+ - - 8
146
+ - 32
147
+ - 128
148
+ - 256
149
+ - 1024
150
+ MuLawQuantization.name: null
151
+ MuLawQuantization.prob: 1.0
152
+
153
+ NoamScheduler.d_model: 512
154
+ NoamScheduler.factor: 2.0
155
+ NoamScheduler.warmup: 500
156
+
157
+ NoiseFloor.db: !!python/tuple
158
+ - const
159
+ - -50.0
160
+ NoiseFloor.name: null
161
+ NoiseFloor.prob: 1.0
162
+
163
+ Quantization.channels: !!python/tuple
164
+ - choice
165
+ - - 8
166
+ - 32
167
+ - 128
168
+ - 256
169
+ - 1024
170
+ Quantization.name: null
171
+ Quantization.prob: 1.0
172
+
173
+ Repeat.n_repeat: 1
174
+ Repeat.name: null
175
+ Repeat.prob: 1.0
176
+
177
+ RepeatUpTo.max_repeat: 5
178
+ RepeatUpTo.name: null
179
+ RepeatUpTo.prob: 1.0
180
+ RepeatUpTo.weights: null
181
+
182
+ RescaleAudio.name: null
183
+ RescaleAudio.prob: 1
184
+ RescaleAudio.val: 1.0
185
+
186
+ RoomImpulseResponse.drr: !!python/tuple
187
+ - uniform
188
+ - 0.0
189
+ - 30.0
190
+ RoomImpulseResponse.duration: 1.0
191
+ RoomImpulseResponse.eq_amount: !!python/tuple
192
+ - const
193
+ - 1.0
194
+ RoomImpulseResponse.n_bands: 6
195
+ RoomImpulseResponse.name: null
196
+ RoomImpulseResponse.offset: 0.0
197
+ RoomImpulseResponse.prob: 1.0
198
+ RoomImpulseResponse.sources: null
199
+ RoomImpulseResponse.use_original_phase: false
200
+ RoomImpulseResponse.weights: null
201
+
202
+ ShiftPhase.name: null
203
+ ShiftPhase.prob: 1
204
+ ShiftPhase.shift: !!python/tuple
205
+ - uniform
206
+ - -3.141592653589793
207
+ - 3.141592653589793
208
+
209
+ Silence.name: null
210
+ Silence.prob: 0.1
211
+
212
+ Smoothing.name: null
213
+ Smoothing.prob: 1
214
+ Smoothing.window_length: !!python/tuple
215
+ - choice
216
+ - - 8
217
+ - 16
218
+ - 32
219
+ - 64
220
+ - 128
221
+ - 256
222
+ - 512
223
+ Smoothing.window_type: !!python/tuple
224
+ - const
225
+ - average
226
+
227
+ SpectralDenoising.denoise_amount: !!python/tuple
228
+ - uniform
229
+ - 0.8
230
+ - 1.0
231
+ SpectralDenoising.eq_amount: !!python/tuple
232
+ - const
233
+ - 1.0
234
+ SpectralDenoising.n_bands: 6
235
+ SpectralDenoising.n_freq: 3
236
+ SpectralDenoising.n_time: 5
237
+ SpectralDenoising.name: null
238
+ SpectralDenoising.nz_volume: -40
239
+ SpectralDenoising.prob: 1
240
+
241
+ TimeMask.name: null
242
+ TimeMask.prob: 1
243
+ TimeMask.t_center: !!python/tuple
244
+ - uniform
245
+ - 0.0
246
+ - 1.0
247
+ TimeMask.t_width: !!python/tuple
248
+ - const
249
+ - 0.025
250
+
251
+ TimeNoise.name: null
252
+ TimeNoise.prob: 1
253
+ TimeNoise.t_center: !!python/tuple
254
+ - uniform
255
+ - 0.0
256
+ - 1.0
257
+ TimeNoise.t_width: !!python/tuple
258
+ - const
259
+ - 0.025
260
+
261
+ VampNet.dropout: 0.1
262
+ VampNet.embedding_dim: 1280
263
+ VampNet.flash_attn: false
264
+ VampNet.latent_dim: 8
265
+ VampNet.n_codebooks: 14
266
+ VampNet.n_conditioning_codebooks: 4
267
+ VampNet.n_heads: 20
268
+ VampNet.n_layers: 16
269
+ VampNet.noise_mode: mask
270
+ VampNet.r_cond_dim: 0
271
+ VampNet.vocab_size: 1024
272
+
273
+ VolumeChange.db: !!python/tuple
274
+ - uniform
275
+ - -12.0
276
+ - 0.0
277
+ VolumeChange.name: null
278
+ VolumeChange.prob: 1.0
279
+
280
+ VolumeNorm.db: !!python/tuple
281
+ - const
282
+ - -24
283
+ VolumeNorm.name: null
284
+ VolumeNorm.prob: 1.0
285
+
286
+ amp: false
287
+
288
+ args.debug: true
289
+ args.load: conf/generated/soundrangers2/c2f.yml
290
+ args.save: null
291
+
292
+ batch_size: 6
293
+
294
+ codec_ckpt: ./models/vampnet/codec.pth
295
+
296
+ fine_tune: true
297
+
298
+ fine_tune_checkpoint: ./models/vampnet/c2f.pth
299
+
300
+ grad_clip_val: 5.0
301
+
302
+ num_iters: 500000
303
+
304
+ num_workers: 7
305
+
306
+ resume: true
307
+
308
+ sample_freq: 2000
309
+
310
+ save_iters:
311
+ - 2000
312
+ - 4000
313
+ - 10000
314
+ - 20000
315
+ - 40000
316
+ - 100000
317
+
318
+ save_path: ./runs/soundrangers-v2/c2f
319
+
320
+ seed: 0
321
+
322
+ tag: latest
323
+
324
+ train/AudioDataset.aligned: false
325
+ train/AudioDataset.duration: 3.0
326
+ train/AudioDataset.loudness_cutoff: -40.0
327
+ train/AudioDataset.n_examples: 100000000
328
+ train/AudioDataset.num_channels: 1
329
+ train/AudioDataset.offset: null
330
+ train/AudioDataset.shuffle_loaders: false
331
+ train/AudioDataset.without_replacement: false
332
+
333
+ train/AudioLoader.sources:
334
+ - /media/CHONK2/prosound_redacted/Soundrangers Complete
335
+ - /media/CHONK2/prosound_redacted/Soundrangers Update 2018
336
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
337
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
338
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
339
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
340
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
341
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
342
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
343
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
344
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
345
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Bells
346
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
347
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
348
+
349
+ train/BackgroundNoise.eq_amount: !!python/tuple
350
+ - const
351
+ - 1.0
352
+ train/BackgroundNoise.loudness_cutoff: null
353
+ train/BackgroundNoise.n_bands: 3
354
+ train/BackgroundNoise.name: null
355
+ train/BackgroundNoise.prob: 1.0
356
+ train/BackgroundNoise.snr: !!python/tuple
357
+ - uniform
358
+ - 10.0
359
+ - 30.0
360
+ train/BackgroundNoise.sources: null
361
+ train/BackgroundNoise.weights: null
362
+
363
+ train/BaseTransform.keys: []
364
+ train/BaseTransform.name: null
365
+ train/BaseTransform.prob: 1.0
366
+
367
+ train/ClippingDistortion.name: null
368
+ train/ClippingDistortion.perc: !!python/tuple
369
+ - uniform
370
+ - 0.0
371
+ - 0.1
372
+ train/ClippingDistortion.prob: 1.0
373
+
374
+ train/CorruptPhase.name: null
375
+ train/CorruptPhase.prob: 1
376
+ train/CorruptPhase.scale: !!python/tuple
377
+ - uniform
378
+ - 0
379
+ - 3.141592653589793
380
+
381
+ train/CrossTalk.loudness_cutoff: -40
382
+ train/CrossTalk.name: null
383
+ train/CrossTalk.prob: 1.0
384
+ train/CrossTalk.snr: !!python/tuple
385
+ - uniform
386
+ - 0.0
387
+ - 10.0
388
+ train/CrossTalk.sources: null
389
+ train/CrossTalk.weights: null
390
+
391
+ train/Equalizer.eq_amount: !!python/tuple
392
+ - const
393
+ - 1.0
394
+ train/Equalizer.n_bands: 6
395
+ train/Equalizer.name: null
396
+ train/Equalizer.prob: 1.0
397
+
398
+ train/FrequencyMask.f_center: !!python/tuple
399
+ - uniform
400
+ - 0.0
401
+ - 1.0
402
+ train/FrequencyMask.f_width: !!python/tuple
403
+ - const
404
+ - 0.1
405
+ train/FrequencyMask.name: null
406
+ train/FrequencyMask.prob: 1
407
+
408
+ train/FrequencyNoise.f_center: !!python/tuple
409
+ - uniform
410
+ - 0.0
411
+ - 1.0
412
+ train/FrequencyNoise.f_width: !!python/tuple
413
+ - const
414
+ - 0.1
415
+ train/FrequencyNoise.name: null
416
+ train/FrequencyNoise.prob: 1
417
+
418
+ train/GlobalVolumeNorm.db: !!python/tuple
419
+ - const
420
+ - -24
421
+ train/GlobalVolumeNorm.name: null
422
+ train/GlobalVolumeNorm.prob: 1.0
423
+
424
+ train/HighPass.cutoff: !!python/tuple
425
+ - choice
426
+ - - 50
427
+ - 100
428
+ - 250
429
+ - 500
430
+ - 1000
431
+ train/HighPass.name: null
432
+ train/HighPass.prob: 1
433
+ train/HighPass.zeros: 51
434
+
435
+ train/InvertPhase.name: null
436
+ train/InvertPhase.prob: 1
437
+
438
+ train/LowPass.cutoff: !!python/tuple
439
+ - choice
440
+ - - 4000
441
+ - 8000
442
+ - 16000
443
+ train/LowPass.name: null
444
+ train/LowPass.prob: 1
445
+ train/LowPass.zeros: 51
446
+
447
+ train/MaskLowMagnitudes.db_cutoff: !!python/tuple
448
+ - uniform
449
+ - -10
450
+ - 10
451
+ train/MaskLowMagnitudes.name: null
452
+ train/MaskLowMagnitudes.prob: 1
453
+
454
+ train/MuLawQuantization.channels: !!python/tuple
455
+ - choice
456
+ - - 8
457
+ - 32
458
+ - 128
459
+ - 256
460
+ - 1024
461
+ train/MuLawQuantization.name: null
462
+ train/MuLawQuantization.prob: 1.0
463
+
464
+ train/NoiseFloor.db: !!python/tuple
465
+ - const
466
+ - -50.0
467
+ train/NoiseFloor.name: null
468
+ train/NoiseFloor.prob: 1.0
469
+
470
+ train/Quantization.channels: !!python/tuple
471
+ - choice
472
+ - - 8
473
+ - 32
474
+ - 128
475
+ - 256
476
+ - 1024
477
+ train/Quantization.name: null
478
+ train/Quantization.prob: 1.0
479
+
480
+ train/Repeat.n_repeat: 1
481
+ train/Repeat.name: null
482
+ train/Repeat.prob: 1.0
483
+
484
+ train/RepeatUpTo.max_repeat: 5
485
+ train/RepeatUpTo.name: null
486
+ train/RepeatUpTo.prob: 1.0
487
+ train/RepeatUpTo.weights: null
488
+
489
+ train/RescaleAudio.name: null
490
+ train/RescaleAudio.prob: 1
491
+ train/RescaleAudio.val: 1.0
492
+
493
+ train/RoomImpulseResponse.drr: !!python/tuple
494
+ - uniform
495
+ - 0.0
496
+ - 30.0
497
+ train/RoomImpulseResponse.duration: 1.0
498
+ train/RoomImpulseResponse.eq_amount: !!python/tuple
499
+ - const
500
+ - 1.0
501
+ train/RoomImpulseResponse.n_bands: 6
502
+ train/RoomImpulseResponse.name: null
503
+ train/RoomImpulseResponse.offset: 0.0
504
+ train/RoomImpulseResponse.prob: 1.0
505
+ train/RoomImpulseResponse.sources: null
506
+ train/RoomImpulseResponse.use_original_phase: false
507
+ train/RoomImpulseResponse.weights: null
508
+
509
+ train/ShiftPhase.name: null
510
+ train/ShiftPhase.prob: 1
511
+ train/ShiftPhase.shift: !!python/tuple
512
+ - uniform
513
+ - -3.141592653589793
514
+ - 3.141592653589793
515
+
516
+ train/Silence.name: null
517
+ train/Silence.prob: 0.1
518
+
519
+ train/Smoothing.name: null
520
+ train/Smoothing.prob: 1
521
+ train/Smoothing.window_length: !!python/tuple
522
+ - choice
523
+ - - 8
524
+ - 16
525
+ - 32
526
+ - 64
527
+ - 128
528
+ - 256
529
+ - 512
530
+ train/Smoothing.window_type: !!python/tuple
531
+ - const
532
+ - average
533
+
534
+ train/SpectralDenoising.denoise_amount: !!python/tuple
535
+ - uniform
536
+ - 0.8
537
+ - 1.0
538
+ train/SpectralDenoising.eq_amount: !!python/tuple
539
+ - const
540
+ - 1.0
541
+ train/SpectralDenoising.n_bands: 6
542
+ train/SpectralDenoising.n_freq: 3
543
+ train/SpectralDenoising.n_time: 5
544
+ train/SpectralDenoising.name: null
545
+ train/SpectralDenoising.nz_volume: -40
546
+ train/SpectralDenoising.prob: 1
547
+
548
+ train/TimeMask.name: null
549
+ train/TimeMask.prob: 1
550
+ train/TimeMask.t_center: !!python/tuple
551
+ - uniform
552
+ - 0.0
553
+ - 1.0
554
+ train/TimeMask.t_width: !!python/tuple
555
+ - const
556
+ - 0.025
557
+
558
+ train/TimeNoise.name: null
559
+ train/TimeNoise.prob: 1
560
+ train/TimeNoise.t_center: !!python/tuple
561
+ - uniform
562
+ - 0.0
563
+ - 1.0
564
+ train/TimeNoise.t_width: !!python/tuple
565
+ - const
566
+ - 0.025
567
+
568
+ train/VolumeChange.db: !!python/tuple
569
+ - uniform
570
+ - -12.0
571
+ - 0.0
572
+ train/VolumeChange.name: null
573
+ train/VolumeChange.prob: 1.0
574
+
575
+ train/VolumeNorm.db: !!python/tuple
576
+ - const
577
+ - -24
578
+ train/VolumeNorm.name: null
579
+ train/VolumeNorm.prob: 1.0
580
+
581
+ val/AudioDataset.aligned: false
582
+ val/AudioDataset.duration: 3.0
583
+ val/AudioDataset.loudness_cutoff: -40.0
584
+ val/AudioDataset.n_examples: 500
585
+ val/AudioDataset.num_channels: 1
586
+ val/AudioDataset.offset: null
587
+ val/AudioDataset.shuffle_loaders: false
588
+ val/AudioDataset.without_replacement: false
589
+
590
+ val/AudioLoader.sources:
591
+ - /media/CHONK2/prosound_redacted/Soundrangers Complete
592
+ - /media/CHONK2/prosound_redacted/Soundrangers Update 2018
593
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
594
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
595
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
596
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
597
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
598
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
599
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
600
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
601
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
602
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Bells
603
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
604
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
605
+
606
+ val/BackgroundNoise.eq_amount: !!python/tuple
607
+ - const
608
+ - 1.0
609
+ val/BackgroundNoise.loudness_cutoff: null
610
+ val/BackgroundNoise.n_bands: 3
611
+ val/BackgroundNoise.name: null
612
+ val/BackgroundNoise.prob: 1.0
613
+ val/BackgroundNoise.snr: !!python/tuple
614
+ - uniform
615
+ - 10.0
616
+ - 30.0
617
+ val/BackgroundNoise.sources: null
618
+ val/BackgroundNoise.weights: null
619
+
620
+ val/BaseTransform.keys: []
621
+ val/BaseTransform.name: null
622
+ val/BaseTransform.prob: 1.0
623
+
624
+ val/ClippingDistortion.name: null
625
+ val/ClippingDistortion.perc: !!python/tuple
626
+ - uniform
627
+ - 0.0
628
+ - 0.1
629
+ val/ClippingDistortion.prob: 1.0
630
+
631
+ val/CorruptPhase.name: null
632
+ val/CorruptPhase.prob: 1
633
+ val/CorruptPhase.scale: !!python/tuple
634
+ - uniform
635
+ - 0
636
+ - 3.141592653589793
637
+
638
+ val/CrossTalk.loudness_cutoff: -40
639
+ val/CrossTalk.name: null
640
+ val/CrossTalk.prob: 1.0
641
+ val/CrossTalk.snr: !!python/tuple
642
+ - uniform
643
+ - 0.0
644
+ - 10.0
645
+ val/CrossTalk.sources: null
646
+ val/CrossTalk.weights: null
647
+
648
+ val/Equalizer.eq_amount: !!python/tuple
649
+ - const
650
+ - 1.0
651
+ val/Equalizer.n_bands: 6
652
+ val/Equalizer.name: null
653
+ val/Equalizer.prob: 1.0
654
+
655
+ val/FrequencyMask.f_center: !!python/tuple
656
+ - uniform
657
+ - 0.0
658
+ - 1.0
659
+ val/FrequencyMask.f_width: !!python/tuple
660
+ - const
661
+ - 0.1
662
+ val/FrequencyMask.name: null
663
+ val/FrequencyMask.prob: 1
664
+
665
+ val/FrequencyNoise.f_center: !!python/tuple
666
+ - uniform
667
+ - 0.0
668
+ - 1.0
669
+ val/FrequencyNoise.f_width: !!python/tuple
670
+ - const
671
+ - 0.1
672
+ val/FrequencyNoise.name: null
673
+ val/FrequencyNoise.prob: 1
674
+
675
+ val/GlobalVolumeNorm.db: !!python/tuple
676
+ - const
677
+ - -24
678
+ val/GlobalVolumeNorm.name: null
679
+ val/GlobalVolumeNorm.prob: 1.0
680
+
681
+ val/HighPass.cutoff: !!python/tuple
682
+ - choice
683
+ - - 50
684
+ - 100
685
+ - 250
686
+ - 500
687
+ - 1000
688
+ val/HighPass.name: null
689
+ val/HighPass.prob: 1
690
+ val/HighPass.zeros: 51
691
+
692
+ val/InvertPhase.name: null
693
+ val/InvertPhase.prob: 1
694
+
695
+ val/LowPass.cutoff: !!python/tuple
696
+ - choice
697
+ - - 4000
698
+ - 8000
699
+ - 16000
700
+ val/LowPass.name: null
701
+ val/LowPass.prob: 1
702
+ val/LowPass.zeros: 51
703
+
704
+ val/MaskLowMagnitudes.db_cutoff: !!python/tuple
705
+ - uniform
706
+ - -10
707
+ - 10
708
+ val/MaskLowMagnitudes.name: null
709
+ val/MaskLowMagnitudes.prob: 1
710
+
711
+ val/MuLawQuantization.channels: !!python/tuple
712
+ - choice
713
+ - - 8
714
+ - 32
715
+ - 128
716
+ - 256
717
+ - 1024
718
+ val/MuLawQuantization.name: null
719
+ val/MuLawQuantization.prob: 1.0
720
+
721
+ val/NoiseFloor.db: !!python/tuple
722
+ - const
723
+ - -50.0
724
+ val/NoiseFloor.name: null
725
+ val/NoiseFloor.prob: 1.0
726
+
727
+ val/Quantization.channels: !!python/tuple
728
+ - choice
729
+ - - 8
730
+ - 32
731
+ - 128
732
+ - 256
733
+ - 1024
734
+ val/Quantization.name: null
735
+ val/Quantization.prob: 1.0
736
+
737
+ val/Repeat.n_repeat: 1
738
+ val/Repeat.name: null
739
+ val/Repeat.prob: 1.0
740
+
741
+ val/RepeatUpTo.max_repeat: 5
742
+ val/RepeatUpTo.name: null
743
+ val/RepeatUpTo.prob: 1.0
744
+ val/RepeatUpTo.weights: null
745
+
746
+ val/RescaleAudio.name: null
747
+ val/RescaleAudio.prob: 1
748
+ val/RescaleAudio.val: 1.0
749
+
750
+ val/RoomImpulseResponse.drr: !!python/tuple
751
+ - uniform
752
+ - 0.0
753
+ - 30.0
754
+ val/RoomImpulseResponse.duration: 1.0
755
+ val/RoomImpulseResponse.eq_amount: !!python/tuple
756
+ - const
757
+ - 1.0
758
+ val/RoomImpulseResponse.n_bands: 6
759
+ val/RoomImpulseResponse.name: null
760
+ val/RoomImpulseResponse.offset: 0.0
761
+ val/RoomImpulseResponse.prob: 1.0
762
+ val/RoomImpulseResponse.sources: null
763
+ val/RoomImpulseResponse.use_original_phase: false
764
+ val/RoomImpulseResponse.weights: null
765
+
766
+ val/ShiftPhase.name: null
767
+ val/ShiftPhase.prob: 1
768
+ val/ShiftPhase.shift: !!python/tuple
769
+ - uniform
770
+ - -3.141592653589793
771
+ - 3.141592653589793
772
+
773
+ val/Silence.name: null
774
+ val/Silence.prob: 0.1
775
+
776
+ val/Smoothing.name: null
777
+ val/Smoothing.prob: 1
778
+ val/Smoothing.window_length: !!python/tuple
779
+ - choice
780
+ - - 8
781
+ - 16
782
+ - 32
783
+ - 64
784
+ - 128
785
+ - 256
786
+ - 512
787
+ val/Smoothing.window_type: !!python/tuple
788
+ - const
789
+ - average
790
+
791
+ val/SpectralDenoising.denoise_amount: !!python/tuple
792
+ - uniform
793
+ - 0.8
794
+ - 1.0
795
+ val/SpectralDenoising.eq_amount: !!python/tuple
796
+ - const
797
+ - 1.0
798
+ val/SpectralDenoising.n_bands: 6
799
+ val/SpectralDenoising.n_freq: 3
800
+ val/SpectralDenoising.n_time: 5
801
+ val/SpectralDenoising.name: null
802
+ val/SpectralDenoising.nz_volume: -40
803
+ val/SpectralDenoising.prob: 1
804
+
805
+ val/TimeMask.name: null
806
+ val/TimeMask.prob: 1
807
+ val/TimeMask.t_center: !!python/tuple
808
+ - uniform
809
+ - 0.0
810
+ - 1.0
811
+ val/TimeMask.t_width: !!python/tuple
812
+ - const
813
+ - 0.025
814
+
815
+ val/TimeNoise.name: null
816
+ val/TimeNoise.prob: 1
817
+ val/TimeNoise.t_center: !!python/tuple
818
+ - uniform
819
+ - 0.0
820
+ - 1.0
821
+ val/TimeNoise.t_width: !!python/tuple
822
+ - const
823
+ - 0.025
824
+
825
+ val/VolumeChange.db: !!python/tuple
826
+ - uniform
827
+ - -12.0
828
+ - 0.0
829
+ val/VolumeChange.name: null
830
+ val/VolumeChange.prob: 1.0
831
+
832
+ val/VolumeNorm.db: !!python/tuple
833
+ - const
834
+ - -24
835
+ val/VolumeNorm.name: null
836
+ val/VolumeNorm.prob: 1.0
837
+
838
+ val_freq: 1000
839
+
840
+ val_idx:
841
+ - 0
842
+ - 1
843
+ - 2
844
+ - 3
845
+ - 4
846
+ - 5
847
+ - 6
848
+ - 7
849
+ - 8
850
+ - 9
851
+
runs/soundrangers-v2-v1/c2f/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82d83c323601ef3ae23d574cde1f93539bb3f057451d3e0a495b562fcc96deaa
3
+ size 1111127537
runs/soundrangers-v2-v1/c2f/model.txt ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ VampNet(
2
+ 277.753M params.
3
+ (embedding): CodebookEmbedding(
4
+ 0.145M params.
5
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
6
+ (out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
7
+ )
8
+ (transformer): TransformerStack(
9
+ 264.481M params.
10
+ (layers): ModuleList(
11
+ (0): TransformerLayer(
12
+ 16.531M params.
13
+ (norm_1): RMSNorm( 0.001M params.)
14
+ (film_1): FiLM( 0.000M params.)
15
+ (self_attn): MultiHeadRelativeAttention(
16
+ 6.616M params.
17
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
18
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
19
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
21
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
22
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
23
+ )
24
+ (norm_3): RMSNorm( 0.001M params.)
25
+ (film_3): FiLM( 0.000M params.)
26
+ (feed_forward): FeedForward(
27
+ 9.912M params.
28
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
29
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
30
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
31
+ (act): GatedGELU(
32
+ 0.000M params.
33
+ (gelu): NewGELU( 0.000M params.)
34
+ )
35
+ )
36
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
37
+ )
38
+ (1-15): 15 x TransformerLayer(
39
+ 16.530M params.
40
+ (norm_1): RMSNorm( 0.001M params.)
41
+ (film_1): FiLM( 0.000M params.)
42
+ (self_attn): MultiHeadRelativeAttention(
43
+ 6.615M params.
44
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
45
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
46
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
48
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
49
+ )
50
+ (norm_3): RMSNorm( 0.001M params.)
51
+ (film_3): FiLM( 0.000M params.)
52
+ (feed_forward): FeedForward(
53
+ 9.912M params.
54
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
55
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
56
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
57
+ (act): GatedGELU(
58
+ 0.000M params.
59
+ (gelu): NewGELU( 0.000M params.)
60
+ )
61
+ )
62
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
63
+ )
64
+ )
65
+ (norm): RMSNorm( 0.001M params.)
66
+ )
67
+ (classifier): SequentialWithFiLM(
68
+ 13.128M params.
69
+ (layers): ModuleList(
70
+ (0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
71
+ )
72
+ )
73
+ )
runs/soundrangers-v2-v1/coarse/args.yml ADDED
@@ -0,0 +1,851 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 10.0
14
+ AudioDataset.loudness_cutoff: -30.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: ''
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ BackgroundNoise.eq_amount: !!python/tuple
33
+ - const
34
+ - 1.0
35
+ BackgroundNoise.loudness_cutoff: null
36
+ BackgroundNoise.n_bands: 3
37
+ BackgroundNoise.name: null
38
+ BackgroundNoise.prob: 1.0
39
+ BackgroundNoise.snr: !!python/tuple
40
+ - uniform
41
+ - 10.0
42
+ - 30.0
43
+ BackgroundNoise.sources: null
44
+ BackgroundNoise.weights: null
45
+
46
+ BaseTransform.keys: []
47
+ BaseTransform.name: null
48
+ BaseTransform.prob: 1.0
49
+
50
+ ClippingDistortion.name: null
51
+ ClippingDistortion.perc: !!python/tuple
52
+ - uniform
53
+ - 0.0
54
+ - 0.1
55
+ ClippingDistortion.prob: 1.0
56
+
57
+ CorruptPhase.name: null
58
+ CorruptPhase.prob: 1
59
+ CorruptPhase.scale: !!python/tuple
60
+ - uniform
61
+ - 0
62
+ - 3.141592653589793
63
+
64
+ CrossEntropyLoss.ignore_index: -100
65
+ CrossEntropyLoss.label_smoothing: 0.1
66
+ CrossEntropyLoss.reduce: null
67
+ CrossEntropyLoss.reduction: mean
68
+ CrossEntropyLoss.size_average: null
69
+
70
+ CrossTalk.loudness_cutoff: -40
71
+ CrossTalk.name: null
72
+ CrossTalk.prob: 1.0
73
+ CrossTalk.snr: !!python/tuple
74
+ - uniform
75
+ - 0.0
76
+ - 10.0
77
+ CrossTalk.sources: null
78
+ CrossTalk.weights: null
79
+
80
+ Equalizer.eq_amount: !!python/tuple
81
+ - const
82
+ - 1.0
83
+ Equalizer.n_bands: 6
84
+ Equalizer.name: null
85
+ Equalizer.prob: 1.0
86
+
87
+ FrequencyMask.f_center: !!python/tuple
88
+ - uniform
89
+ - 0.0
90
+ - 1.0
91
+ FrequencyMask.f_width: !!python/tuple
92
+ - const
93
+ - 0.1
94
+ FrequencyMask.name: null
95
+ FrequencyMask.prob: 1
96
+
97
+ FrequencyNoise.f_center: !!python/tuple
98
+ - uniform
99
+ - 0.0
100
+ - 1.0
101
+ FrequencyNoise.f_width: !!python/tuple
102
+ - const
103
+ - 0.1
104
+ FrequencyNoise.name: null
105
+ FrequencyNoise.prob: 1
106
+
107
+ GlobalVolumeNorm.db: !!python/tuple
108
+ - const
109
+ - -24
110
+ GlobalVolumeNorm.name: null
111
+ GlobalVolumeNorm.prob: 1.0
112
+
113
+ HighPass.cutoff: !!python/tuple
114
+ - choice
115
+ - - 50
116
+ - 100
117
+ - 250
118
+ - 500
119
+ - 1000
120
+ HighPass.name: null
121
+ HighPass.prob: 1
122
+ HighPass.zeros: 51
123
+
124
+ InvertPhase.name: null
125
+ InvertPhase.prob: 1
126
+
127
+ LowPass.cutoff: !!python/tuple
128
+ - choice
129
+ - - 4000
130
+ - 8000
131
+ - 16000
132
+ LowPass.name: null
133
+ LowPass.prob: 1
134
+ LowPass.zeros: 51
135
+
136
+ MaskLowMagnitudes.db_cutoff: !!python/tuple
137
+ - uniform
138
+ - -10
139
+ - 10
140
+ MaskLowMagnitudes.name: null
141
+ MaskLowMagnitudes.prob: 1
142
+
143
+ MuLawQuantization.channels: !!python/tuple
144
+ - choice
145
+ - - 8
146
+ - 32
147
+ - 128
148
+ - 256
149
+ - 1024
150
+ MuLawQuantization.name: null
151
+ MuLawQuantization.prob: 1.0
152
+
153
+ NoamScheduler.d_model: 512
154
+ NoamScheduler.factor: 2.0
155
+ NoamScheduler.warmup: 500
156
+
157
+ NoiseFloor.db: !!python/tuple
158
+ - const
159
+ - -50.0
160
+ NoiseFloor.name: null
161
+ NoiseFloor.prob: 1.0
162
+
163
+ Quantization.channels: !!python/tuple
164
+ - choice
165
+ - - 8
166
+ - 32
167
+ - 128
168
+ - 256
169
+ - 1024
170
+ Quantization.name: null
171
+ Quantization.prob: 1.0
172
+
173
+ Repeat.n_repeat: 1
174
+ Repeat.name: null
175
+ Repeat.prob: 1.0
176
+
177
+ RepeatUpTo.max_repeat: 5
178
+ RepeatUpTo.name: null
179
+ RepeatUpTo.prob: 1.0
180
+ RepeatUpTo.weights: null
181
+
182
+ RescaleAudio.name: null
183
+ RescaleAudio.prob: 1
184
+ RescaleAudio.val: 1.0
185
+
186
+ RoomImpulseResponse.drr: !!python/tuple
187
+ - uniform
188
+ - 0.0
189
+ - 30.0
190
+ RoomImpulseResponse.duration: 1.0
191
+ RoomImpulseResponse.eq_amount: !!python/tuple
192
+ - const
193
+ - 1.0
194
+ RoomImpulseResponse.n_bands: 6
195
+ RoomImpulseResponse.name: null
196
+ RoomImpulseResponse.offset: 0.0
197
+ RoomImpulseResponse.prob: 1.0
198
+ RoomImpulseResponse.sources: null
199
+ RoomImpulseResponse.use_original_phase: false
200
+ RoomImpulseResponse.weights: null
201
+
202
+ ShiftPhase.name: null
203
+ ShiftPhase.prob: 1
204
+ ShiftPhase.shift: !!python/tuple
205
+ - uniform
206
+ - -3.141592653589793
207
+ - 3.141592653589793
208
+
209
+ Silence.name: null
210
+ Silence.prob: 0.1
211
+
212
+ Smoothing.name: null
213
+ Smoothing.prob: 1
214
+ Smoothing.window_length: !!python/tuple
215
+ - choice
216
+ - - 8
217
+ - 16
218
+ - 32
219
+ - 64
220
+ - 128
221
+ - 256
222
+ - 512
223
+ Smoothing.window_type: !!python/tuple
224
+ - const
225
+ - average
226
+
227
+ SpectralDenoising.denoise_amount: !!python/tuple
228
+ - uniform
229
+ - 0.8
230
+ - 1.0
231
+ SpectralDenoising.eq_amount: !!python/tuple
232
+ - const
233
+ - 1.0
234
+ SpectralDenoising.n_bands: 6
235
+ SpectralDenoising.n_freq: 3
236
+ SpectralDenoising.n_time: 5
237
+ SpectralDenoising.name: null
238
+ SpectralDenoising.nz_volume: -40
239
+ SpectralDenoising.prob: 1
240
+
241
+ TimeMask.name: null
242
+ TimeMask.prob: 1
243
+ TimeMask.t_center: !!python/tuple
244
+ - uniform
245
+ - 0.0
246
+ - 1.0
247
+ TimeMask.t_width: !!python/tuple
248
+ - const
249
+ - 0.025
250
+
251
+ TimeNoise.name: null
252
+ TimeNoise.prob: 1
253
+ TimeNoise.t_center: !!python/tuple
254
+ - uniform
255
+ - 0.0
256
+ - 1.0
257
+ TimeNoise.t_width: !!python/tuple
258
+ - const
259
+ - 0.025
260
+
261
+ VampNet.dropout: 0.1
262
+ VampNet.embedding_dim: 1280
263
+ VampNet.flash_attn: false
264
+ VampNet.latent_dim: 8
265
+ VampNet.n_codebooks: 4
266
+ VampNet.n_conditioning_codebooks: 0
267
+ VampNet.n_heads: 20
268
+ VampNet.n_layers: 20
269
+ VampNet.noise_mode: mask
270
+ VampNet.r_cond_dim: 0
271
+ VampNet.vocab_size: 1024
272
+
273
+ VolumeChange.db: !!python/tuple
274
+ - uniform
275
+ - -12.0
276
+ - 0.0
277
+ VolumeChange.name: null
278
+ VolumeChange.prob: 1.0
279
+
280
+ VolumeNorm.db: !!python/tuple
281
+ - const
282
+ - -24
283
+ VolumeNorm.name: null
284
+ VolumeNorm.prob: 1.0
285
+
286
+ amp: false
287
+
288
+ args.debug: true
289
+ args.load: conf/generated/soundrangers2/coarse.yml
290
+ args.save: null
291
+
292
+ batch_size: 6
293
+
294
+ codec_ckpt: ./models/vampnet/codec.pth
295
+
296
+ fine_tune: true
297
+
298
+ fine_tune_checkpoint: ./models/vampnet/coarse.pth
299
+
300
+ grad_clip_val: 5.0
301
+
302
+ num_iters: 500000
303
+
304
+ num_workers: 7
305
+
306
+ resume: true
307
+
308
+ sample_freq: 2000
309
+
310
+ save_iters:
311
+ - 2000
312
+ - 4000
313
+ - 10000
314
+ - 20000
315
+ - 40000
316
+ - 100000
317
+
318
+ save_path: ./runs/soundrangers-v2/coarse
319
+
320
+ seed: 0
321
+
322
+ tag: latest
323
+
324
+ train/AudioDataset.aligned: false
325
+ train/AudioDataset.duration: 10.0
326
+ train/AudioDataset.loudness_cutoff: -30.0
327
+ train/AudioDataset.n_examples: 100000000
328
+ train/AudioDataset.num_channels: 1
329
+ train/AudioDataset.offset: null
330
+ train/AudioDataset.shuffle_loaders: false
331
+ train/AudioDataset.without_replacement: false
332
+
333
+ train/AudioLoader.sources:
334
+ - /media/CHONK2/prosound_redacted/Soundrangers Complete
335
+ - /media/CHONK2/prosound_redacted/Soundrangers Update 2018
336
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
337
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
338
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
339
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
340
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
341
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
342
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
343
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
344
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
345
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Bells
346
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
347
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
348
+
349
+ train/BackgroundNoise.eq_amount: !!python/tuple
350
+ - const
351
+ - 1.0
352
+ train/BackgroundNoise.loudness_cutoff: null
353
+ train/BackgroundNoise.n_bands: 3
354
+ train/BackgroundNoise.name: null
355
+ train/BackgroundNoise.prob: 1.0
356
+ train/BackgroundNoise.snr: !!python/tuple
357
+ - uniform
358
+ - 10.0
359
+ - 30.0
360
+ train/BackgroundNoise.sources: null
361
+ train/BackgroundNoise.weights: null
362
+
363
+ train/BaseTransform.keys: []
364
+ train/BaseTransform.name: null
365
+ train/BaseTransform.prob: 1.0
366
+
367
+ train/ClippingDistortion.name: null
368
+ train/ClippingDistortion.perc: !!python/tuple
369
+ - uniform
370
+ - 0.0
371
+ - 0.1
372
+ train/ClippingDistortion.prob: 1.0
373
+
374
+ train/CorruptPhase.name: null
375
+ train/CorruptPhase.prob: 1
376
+ train/CorruptPhase.scale: !!python/tuple
377
+ - uniform
378
+ - 0
379
+ - 3.141592653589793
380
+
381
+ train/CrossTalk.loudness_cutoff: -40
382
+ train/CrossTalk.name: null
383
+ train/CrossTalk.prob: 1.0
384
+ train/CrossTalk.snr: !!python/tuple
385
+ - uniform
386
+ - 0.0
387
+ - 10.0
388
+ train/CrossTalk.sources: null
389
+ train/CrossTalk.weights: null
390
+
391
+ train/Equalizer.eq_amount: !!python/tuple
392
+ - const
393
+ - 1.0
394
+ train/Equalizer.n_bands: 6
395
+ train/Equalizer.name: null
396
+ train/Equalizer.prob: 1.0
397
+
398
+ train/FrequencyMask.f_center: !!python/tuple
399
+ - uniform
400
+ - 0.0
401
+ - 1.0
402
+ train/FrequencyMask.f_width: !!python/tuple
403
+ - const
404
+ - 0.1
405
+ train/FrequencyMask.name: null
406
+ train/FrequencyMask.prob: 1
407
+
408
+ train/FrequencyNoise.f_center: !!python/tuple
409
+ - uniform
410
+ - 0.0
411
+ - 1.0
412
+ train/FrequencyNoise.f_width: !!python/tuple
413
+ - const
414
+ - 0.1
415
+ train/FrequencyNoise.name: null
416
+ train/FrequencyNoise.prob: 1
417
+
418
+ train/GlobalVolumeNorm.db: !!python/tuple
419
+ - const
420
+ - -24
421
+ train/GlobalVolumeNorm.name: null
422
+ train/GlobalVolumeNorm.prob: 1.0
423
+
424
+ train/HighPass.cutoff: !!python/tuple
425
+ - choice
426
+ - - 50
427
+ - 100
428
+ - 250
429
+ - 500
430
+ - 1000
431
+ train/HighPass.name: null
432
+ train/HighPass.prob: 1
433
+ train/HighPass.zeros: 51
434
+
435
+ train/InvertPhase.name: null
436
+ train/InvertPhase.prob: 1
437
+
438
+ train/LowPass.cutoff: !!python/tuple
439
+ - choice
440
+ - - 4000
441
+ - 8000
442
+ - 16000
443
+ train/LowPass.name: null
444
+ train/LowPass.prob: 1
445
+ train/LowPass.zeros: 51
446
+
447
+ train/MaskLowMagnitudes.db_cutoff: !!python/tuple
448
+ - uniform
449
+ - -10
450
+ - 10
451
+ train/MaskLowMagnitudes.name: null
452
+ train/MaskLowMagnitudes.prob: 1
453
+
454
+ train/MuLawQuantization.channels: !!python/tuple
455
+ - choice
456
+ - - 8
457
+ - 32
458
+ - 128
459
+ - 256
460
+ - 1024
461
+ train/MuLawQuantization.name: null
462
+ train/MuLawQuantization.prob: 1.0
463
+
464
+ train/NoiseFloor.db: !!python/tuple
465
+ - const
466
+ - -50.0
467
+ train/NoiseFloor.name: null
468
+ train/NoiseFloor.prob: 1.0
469
+
470
+ train/Quantization.channels: !!python/tuple
471
+ - choice
472
+ - - 8
473
+ - 32
474
+ - 128
475
+ - 256
476
+ - 1024
477
+ train/Quantization.name: null
478
+ train/Quantization.prob: 1.0
479
+
480
+ train/Repeat.n_repeat: 1
481
+ train/Repeat.name: null
482
+ train/Repeat.prob: 1.0
483
+
484
+ train/RepeatUpTo.max_repeat: 5
485
+ train/RepeatUpTo.name: null
486
+ train/RepeatUpTo.prob: 1.0
487
+ train/RepeatUpTo.weights: null
488
+
489
+ train/RescaleAudio.name: null
490
+ train/RescaleAudio.prob: 1
491
+ train/RescaleAudio.val: 1.0
492
+
493
+ train/RoomImpulseResponse.drr: !!python/tuple
494
+ - uniform
495
+ - 0.0
496
+ - 30.0
497
+ train/RoomImpulseResponse.duration: 1.0
498
+ train/RoomImpulseResponse.eq_amount: !!python/tuple
499
+ - const
500
+ - 1.0
501
+ train/RoomImpulseResponse.n_bands: 6
502
+ train/RoomImpulseResponse.name: null
503
+ train/RoomImpulseResponse.offset: 0.0
504
+ train/RoomImpulseResponse.prob: 1.0
505
+ train/RoomImpulseResponse.sources: null
506
+ train/RoomImpulseResponse.use_original_phase: false
507
+ train/RoomImpulseResponse.weights: null
508
+
509
+ train/ShiftPhase.name: null
510
+ train/ShiftPhase.prob: 1
511
+ train/ShiftPhase.shift: !!python/tuple
512
+ - uniform
513
+ - -3.141592653589793
514
+ - 3.141592653589793
515
+
516
+ train/Silence.name: null
517
+ train/Silence.prob: 0.1
518
+
519
+ train/Smoothing.name: null
520
+ train/Smoothing.prob: 1
521
+ train/Smoothing.window_length: !!python/tuple
522
+ - choice
523
+ - - 8
524
+ - 16
525
+ - 32
526
+ - 64
527
+ - 128
528
+ - 256
529
+ - 512
530
+ train/Smoothing.window_type: !!python/tuple
531
+ - const
532
+ - average
533
+
534
+ train/SpectralDenoising.denoise_amount: !!python/tuple
535
+ - uniform
536
+ - 0.8
537
+ - 1.0
538
+ train/SpectralDenoising.eq_amount: !!python/tuple
539
+ - const
540
+ - 1.0
541
+ train/SpectralDenoising.n_bands: 6
542
+ train/SpectralDenoising.n_freq: 3
543
+ train/SpectralDenoising.n_time: 5
544
+ train/SpectralDenoising.name: null
545
+ train/SpectralDenoising.nz_volume: -40
546
+ train/SpectralDenoising.prob: 1
547
+
548
+ train/TimeMask.name: null
549
+ train/TimeMask.prob: 1
550
+ train/TimeMask.t_center: !!python/tuple
551
+ - uniform
552
+ - 0.0
553
+ - 1.0
554
+ train/TimeMask.t_width: !!python/tuple
555
+ - const
556
+ - 0.025
557
+
558
+ train/TimeNoise.name: null
559
+ train/TimeNoise.prob: 1
560
+ train/TimeNoise.t_center: !!python/tuple
561
+ - uniform
562
+ - 0.0
563
+ - 1.0
564
+ train/TimeNoise.t_width: !!python/tuple
565
+ - const
566
+ - 0.025
567
+
568
+ train/VolumeChange.db: !!python/tuple
569
+ - uniform
570
+ - -12.0
571
+ - 0.0
572
+ train/VolumeChange.name: null
573
+ train/VolumeChange.prob: 1.0
574
+
575
+ train/VolumeNorm.db: !!python/tuple
576
+ - const
577
+ - -24
578
+ train/VolumeNorm.name: null
579
+ train/VolumeNorm.prob: 1.0
580
+
581
+ val/AudioDataset.aligned: false
582
+ val/AudioDataset.duration: 10.0
583
+ val/AudioDataset.loudness_cutoff: -30.0
584
+ val/AudioDataset.n_examples: 500
585
+ val/AudioDataset.num_channels: 1
586
+ val/AudioDataset.offset: null
587
+ val/AudioDataset.shuffle_loaders: false
588
+ val/AudioDataset.without_replacement: false
589
+
590
+ val/AudioLoader.sources:
591
+ - /media/CHONK2/prosound_redacted/Soundrangers Complete
592
+ - /media/CHONK2/prosound_redacted/Soundrangers Update 2018
593
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
594
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
595
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
596
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
597
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
598
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
599
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
600
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
601
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
602
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Bells
603
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
604
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
605
+
606
+ val/BackgroundNoise.eq_amount: !!python/tuple
607
+ - const
608
+ - 1.0
609
+ val/BackgroundNoise.loudness_cutoff: null
610
+ val/BackgroundNoise.n_bands: 3
611
+ val/BackgroundNoise.name: null
612
+ val/BackgroundNoise.prob: 1.0
613
+ val/BackgroundNoise.snr: !!python/tuple
614
+ - uniform
615
+ - 10.0
616
+ - 30.0
617
+ val/BackgroundNoise.sources: null
618
+ val/BackgroundNoise.weights: null
619
+
620
+ val/BaseTransform.keys: []
621
+ val/BaseTransform.name: null
622
+ val/BaseTransform.prob: 1.0
623
+
624
+ val/ClippingDistortion.name: null
625
+ val/ClippingDistortion.perc: !!python/tuple
626
+ - uniform
627
+ - 0.0
628
+ - 0.1
629
+ val/ClippingDistortion.prob: 1.0
630
+
631
+ val/CorruptPhase.name: null
632
+ val/CorruptPhase.prob: 1
633
+ val/CorruptPhase.scale: !!python/tuple
634
+ - uniform
635
+ - 0
636
+ - 3.141592653589793
637
+
638
+ val/CrossTalk.loudness_cutoff: -40
639
+ val/CrossTalk.name: null
640
+ val/CrossTalk.prob: 1.0
641
+ val/CrossTalk.snr: !!python/tuple
642
+ - uniform
643
+ - 0.0
644
+ - 10.0
645
+ val/CrossTalk.sources: null
646
+ val/CrossTalk.weights: null
647
+
648
+ val/Equalizer.eq_amount: !!python/tuple
649
+ - const
650
+ - 1.0
651
+ val/Equalizer.n_bands: 6
652
+ val/Equalizer.name: null
653
+ val/Equalizer.prob: 1.0
654
+
655
+ val/FrequencyMask.f_center: !!python/tuple
656
+ - uniform
657
+ - 0.0
658
+ - 1.0
659
+ val/FrequencyMask.f_width: !!python/tuple
660
+ - const
661
+ - 0.1
662
+ val/FrequencyMask.name: null
663
+ val/FrequencyMask.prob: 1
664
+
665
+ val/FrequencyNoise.f_center: !!python/tuple
666
+ - uniform
667
+ - 0.0
668
+ - 1.0
669
+ val/FrequencyNoise.f_width: !!python/tuple
670
+ - const
671
+ - 0.1
672
+ val/FrequencyNoise.name: null
673
+ val/FrequencyNoise.prob: 1
674
+
675
+ val/GlobalVolumeNorm.db: !!python/tuple
676
+ - const
677
+ - -24
678
+ val/GlobalVolumeNorm.name: null
679
+ val/GlobalVolumeNorm.prob: 1.0
680
+
681
+ val/HighPass.cutoff: !!python/tuple
682
+ - choice
683
+ - - 50
684
+ - 100
685
+ - 250
686
+ - 500
687
+ - 1000
688
+ val/HighPass.name: null
689
+ val/HighPass.prob: 1
690
+ val/HighPass.zeros: 51
691
+
692
+ val/InvertPhase.name: null
693
+ val/InvertPhase.prob: 1
694
+
695
+ val/LowPass.cutoff: !!python/tuple
696
+ - choice
697
+ - - 4000
698
+ - 8000
699
+ - 16000
700
+ val/LowPass.name: null
701
+ val/LowPass.prob: 1
702
+ val/LowPass.zeros: 51
703
+
704
+ val/MaskLowMagnitudes.db_cutoff: !!python/tuple
705
+ - uniform
706
+ - -10
707
+ - 10
708
+ val/MaskLowMagnitudes.name: null
709
+ val/MaskLowMagnitudes.prob: 1
710
+
711
+ val/MuLawQuantization.channels: !!python/tuple
712
+ - choice
713
+ - - 8
714
+ - 32
715
+ - 128
716
+ - 256
717
+ - 1024
718
+ val/MuLawQuantization.name: null
719
+ val/MuLawQuantization.prob: 1.0
720
+
721
+ val/NoiseFloor.db: !!python/tuple
722
+ - const
723
+ - -50.0
724
+ val/NoiseFloor.name: null
725
+ val/NoiseFloor.prob: 1.0
726
+
727
+ val/Quantization.channels: !!python/tuple
728
+ - choice
729
+ - - 8
730
+ - 32
731
+ - 128
732
+ - 256
733
+ - 1024
734
+ val/Quantization.name: null
735
+ val/Quantization.prob: 1.0
736
+
737
+ val/Repeat.n_repeat: 1
738
+ val/Repeat.name: null
739
+ val/Repeat.prob: 1.0
740
+
741
+ val/RepeatUpTo.max_repeat: 5
742
+ val/RepeatUpTo.name: null
743
+ val/RepeatUpTo.prob: 1.0
744
+ val/RepeatUpTo.weights: null
745
+
746
+ val/RescaleAudio.name: null
747
+ val/RescaleAudio.prob: 1
748
+ val/RescaleAudio.val: 1.0
749
+
750
+ val/RoomImpulseResponse.drr: !!python/tuple
751
+ - uniform
752
+ - 0.0
753
+ - 30.0
754
+ val/RoomImpulseResponse.duration: 1.0
755
+ val/RoomImpulseResponse.eq_amount: !!python/tuple
756
+ - const
757
+ - 1.0
758
+ val/RoomImpulseResponse.n_bands: 6
759
+ val/RoomImpulseResponse.name: null
760
+ val/RoomImpulseResponse.offset: 0.0
761
+ val/RoomImpulseResponse.prob: 1.0
762
+ val/RoomImpulseResponse.sources: null
763
+ val/RoomImpulseResponse.use_original_phase: false
764
+ val/RoomImpulseResponse.weights: null
765
+
766
+ val/ShiftPhase.name: null
767
+ val/ShiftPhase.prob: 1
768
+ val/ShiftPhase.shift: !!python/tuple
769
+ - uniform
770
+ - -3.141592653589793
771
+ - 3.141592653589793
772
+
773
+ val/Silence.name: null
774
+ val/Silence.prob: 0.1
775
+
776
+ val/Smoothing.name: null
777
+ val/Smoothing.prob: 1
778
+ val/Smoothing.window_length: !!python/tuple
779
+ - choice
780
+ - - 8
781
+ - 16
782
+ - 32
783
+ - 64
784
+ - 128
785
+ - 256
786
+ - 512
787
+ val/Smoothing.window_type: !!python/tuple
788
+ - const
789
+ - average
790
+
791
+ val/SpectralDenoising.denoise_amount: !!python/tuple
792
+ - uniform
793
+ - 0.8
794
+ - 1.0
795
+ val/SpectralDenoising.eq_amount: !!python/tuple
796
+ - const
797
+ - 1.0
798
+ val/SpectralDenoising.n_bands: 6
799
+ val/SpectralDenoising.n_freq: 3
800
+ val/SpectralDenoising.n_time: 5
801
+ val/SpectralDenoising.name: null
802
+ val/SpectralDenoising.nz_volume: -40
803
+ val/SpectralDenoising.prob: 1
804
+
805
+ val/TimeMask.name: null
806
+ val/TimeMask.prob: 1
807
+ val/TimeMask.t_center: !!python/tuple
808
+ - uniform
809
+ - 0.0
810
+ - 1.0
811
+ val/TimeMask.t_width: !!python/tuple
812
+ - const
813
+ - 0.025
814
+
815
+ val/TimeNoise.name: null
816
+ val/TimeNoise.prob: 1
817
+ val/TimeNoise.t_center: !!python/tuple
818
+ - uniform
819
+ - 0.0
820
+ - 1.0
821
+ val/TimeNoise.t_width: !!python/tuple
822
+ - const
823
+ - 0.025
824
+
825
+ val/VolumeChange.db: !!python/tuple
826
+ - uniform
827
+ - -12.0
828
+ - 0.0
829
+ val/VolumeChange.name: null
830
+ val/VolumeChange.prob: 1.0
831
+
832
+ val/VolumeNorm.db: !!python/tuple
833
+ - const
834
+ - -24
835
+ val/VolumeNorm.name: null
836
+ val/VolumeNorm.prob: 1.0
837
+
838
+ val_freq: 1000
839
+
840
+ val_idx:
841
+ - 0
842
+ - 1
843
+ - 2
844
+ - 3
845
+ - 4
846
+ - 5
847
+ - 6
848
+ - 7
849
+ - 8
850
+ - 9
851
+
runs/soundrangers-v2-v1/coarse/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3809d9bbaa27f5ad1d409945180e11f5420c3c765e09d185fa1dbdd2ee77c59f
3
+ size 1343718241
runs/soundrangers-v2-v1/coarse/model.txt ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ VampNet(
2
+ 335.894M params.
3
+ (embedding): CodebookEmbedding(
4
+ 0.042M params.
5
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
6
+ (out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
7
+ )
8
+ (transformer): TransformerStack(
9
+ 330.600M params.
10
+ (layers): ModuleList(
11
+ (0): TransformerLayer(
12
+ 16.531M params.
13
+ (norm_1): RMSNorm( 0.001M params.)
14
+ (film_1): FiLM( 0.000M params.)
15
+ (self_attn): MultiHeadRelativeAttention(
16
+ 6.616M params.
17
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
18
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
19
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
21
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
22
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
23
+ )
24
+ (norm_3): RMSNorm( 0.001M params.)
25
+ (film_3): FiLM( 0.000M params.)
26
+ (feed_forward): FeedForward(
27
+ 9.912M params.
28
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
29
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
30
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
31
+ (act): GatedGELU(
32
+ 0.000M params.
33
+ (gelu): NewGELU( 0.000M params.)
34
+ )
35
+ )
36
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
37
+ )
38
+ (1-19): 19 x TransformerLayer(
39
+ 16.530M params.
40
+ (norm_1): RMSNorm( 0.001M params.)
41
+ (film_1): FiLM( 0.000M params.)
42
+ (self_attn): MultiHeadRelativeAttention(
43
+ 6.615M params.
44
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
45
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
46
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
48
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
49
+ )
50
+ (norm_3): RMSNorm( 0.001M params.)
51
+ (film_3): FiLM( 0.000M params.)
52
+ (feed_forward): FeedForward(
53
+ 9.912M params.
54
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
55
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
56
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
57
+ (act): GatedGELU(
58
+ 0.000M params.
59
+ (gelu): NewGELU( 0.000M params.)
60
+ )
61
+ )
62
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
63
+ )
64
+ )
65
+ (norm): RMSNorm( 0.001M params.)
66
+ )
67
+ (classifier): SequentialWithFiLM(
68
+ 5.251M params.
69
+ (layers): ModuleList(
70
+ (0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
71
+ )
72
+ )
73
+ )
runs/soundrangers-v2/c2f/args.yml ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 3.0
14
+ AudioDataset.loudness_cutoff: -40.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: ''
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ CrossEntropyLoss.ignore_index: -100
33
+ CrossEntropyLoss.label_smoothing: 0.1
34
+ CrossEntropyLoss.reduce: null
35
+ CrossEntropyLoss.reduction: mean
36
+ CrossEntropyLoss.size_average: null
37
+
38
+ NoamScheduler.d_model: 512
39
+ NoamScheduler.factor: 2.0
40
+ NoamScheduler.warmup: 500
41
+
42
+ VampNet.dropout: 0.1
43
+ VampNet.embedding_dim: 1280
44
+ VampNet.flash_attn: false
45
+ VampNet.latent_dim: 8
46
+ VampNet.n_codebooks: 14
47
+ VampNet.n_conditioning_codebooks: 4
48
+ VampNet.n_heads: 20
49
+ VampNet.n_layers: 16
50
+ VampNet.noise_mode: mask
51
+ VampNet.r_cond_dim: 0
52
+ VampNet.vocab_size: 1024
53
+
54
+ amp: false
55
+
56
+ args.debug: true
57
+ args.load: conf/generated/natural-sounds/c2f.yml
58
+ args.save: null
59
+
60
+ batch_size: 6
61
+
62
+ codec_ckpt: ./models/vampnet/codec.pth
63
+
64
+ fine_tune: true
65
+
66
+ fine_tune_checkpoint: ./models/vampnet/c2f.pth
67
+
68
+ grad_clip_val: 5.0
69
+
70
+ num_iters: 500000
71
+
72
+ num_workers: 7
73
+
74
+ resume: false
75
+
76
+ sample_freq: 2000
77
+
78
+ save_iters:
79
+ - 2000
80
+ - 4000
81
+ - 10000
82
+ - 20000
83
+ - 40000
84
+ - 100000
85
+
86
+ save_path: ./runs/soundrangers-v2/c2f
87
+
88
+ seed: 0
89
+
90
+ tag: latest
91
+
92
+ train/AudioDataset.aligned: false
93
+ train/AudioDataset.duration: 3.0
94
+ train/AudioDataset.loudness_cutoff: -40.0
95
+ train/AudioDataset.n_examples: 100000000
96
+ train/AudioDataset.num_channels: 1
97
+ train/AudioDataset.offset: null
98
+ train/AudioDataset.shuffle_loaders: false
99
+ train/AudioDataset.without_replacement: false
100
+
101
+ train/AudioLoader.sources:
102
+ - /media/CHONK2/prosound_redacted/Soundrangers Complete
103
+ - /media/CHONK2/prosound_redacted/Soundrangers Update 2018
104
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
105
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
106
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
107
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
108
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
109
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
110
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
111
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
112
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
113
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Bells
114
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
115
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
116
+
117
+ val/AudioDataset.aligned: false
118
+ val/AudioDataset.duration: 3.0
119
+ val/AudioDataset.loudness_cutoff: -40.0
120
+ val/AudioDataset.n_examples: 500
121
+ val/AudioDataset.num_channels: 1
122
+ val/AudioDataset.offset: null
123
+ val/AudioDataset.shuffle_loaders: false
124
+ val/AudioDataset.without_replacement: false
125
+
126
+ val/AudioLoader.sources:
127
+ - /media/CHONK2/prosound_redacted/Soundrangers Complete
128
+ - /media/CHONK2/prosound_redacted/Soundrangers Update 2018
129
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
130
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
131
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
132
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
133
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
134
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
135
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
136
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
137
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
138
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Bells
139
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
140
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
141
+
142
+ val_freq: 1000
143
+
144
+ val_idx:
145
+ - 0
146
+ - 1
147
+ - 2
148
+ - 3
149
+ - 4
150
+ - 5
151
+ - 6
152
+ - 7
153
+ - 8
154
+ - 9
155
+
runs/soundrangers-v2/c2f/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f483e7eaa0ea690c30a805936226833ccd2066db4b4309d2edcb542545bd1d62
3
+ size 1111127537
runs/soundrangers-v2/c2f/model.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ OptimizedModule(
2
+ 277.753M params.
3
+ (_orig_mod): VampNet(
4
+ 277.753M params.
5
+ (embedding): CodebookEmbedding(
6
+ 0.145M params.
7
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
8
+ (out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
9
+ )
10
+ (transformer): TransformerStack(
11
+ 264.481M params.
12
+ (layers): ModuleList(
13
+ (0): TransformerLayer(
14
+ 16.531M params.
15
+ (norm_1): RMSNorm( 0.001M params.)
16
+ (film_1): FiLM( 0.000M params.)
17
+ (self_attn): MultiHeadRelativeAttention(
18
+ 6.616M params.
19
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
21
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
22
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
23
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
24
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
25
+ )
26
+ (norm_3): RMSNorm( 0.001M params.)
27
+ (film_3): FiLM( 0.000M params.)
28
+ (feed_forward): FeedForward(
29
+ 9.912M params.
30
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
31
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
32
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
33
+ (act): GatedGELU(
34
+ 0.000M params.
35
+ (gelu): NewGELU( 0.000M params.)
36
+ )
37
+ )
38
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
39
+ )
40
+ (1-15): 15 x TransformerLayer(
41
+ 16.530M params.
42
+ (norm_1): RMSNorm( 0.001M params.)
43
+ (film_1): FiLM( 0.000M params.)
44
+ (self_attn): MultiHeadRelativeAttention(
45
+ 6.615M params.
46
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
48
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
49
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
50
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
51
+ )
52
+ (norm_3): RMSNorm( 0.001M params.)
53
+ (film_3): FiLM( 0.000M params.)
54
+ (feed_forward): FeedForward(
55
+ 9.912M params.
56
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
57
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
58
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
59
+ (act): GatedGELU(
60
+ 0.000M params.
61
+ (gelu): NewGELU( 0.000M params.)
62
+ )
63
+ )
64
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
65
+ )
66
+ )
67
+ (norm): RMSNorm( 0.001M params.)
68
+ )
69
+ (classifier): SequentialWithFiLM(
70
+ 13.128M params.
71
+ (layers): ModuleList(
72
+ (0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
73
+ )
74
+ )
75
+ )
76
+ )
runs/soundrangers-v2/coarse/args.yml ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AdamW.amsgrad: false
2
+ AdamW.betas: !!python/tuple
3
+ - 0.9
4
+ - 0.999
5
+ AdamW.capturable: false
6
+ AdamW.differentiable: false
7
+ AdamW.eps: 1.0e-08
8
+ AdamW.lr: 0.0001
9
+ AdamW.maximize: false
10
+ AdamW.weight_decay: 0.01
11
+
12
+ AudioDataset.aligned: false
13
+ AudioDataset.duration: 10.0
14
+ AudioDataset.loudness_cutoff: -30.0
15
+ AudioDataset.n_examples: 1000
16
+ AudioDataset.num_channels: 1
17
+ AudioDataset.offset: null
18
+ AudioDataset.shuffle_loaders: false
19
+ AudioDataset.without_replacement: false
20
+
21
+ AudioLoader.ext:
22
+ - .wav
23
+ - .flac
24
+ - .mp3
25
+ - .mp4
26
+ AudioLoader.relative_path: ''
27
+ AudioLoader.shuffle: true
28
+ AudioLoader.shuffle_state: 0
29
+ AudioLoader.sources: null
30
+ AudioLoader.weights: null
31
+
32
+ CrossEntropyLoss.ignore_index: -100
33
+ CrossEntropyLoss.label_smoothing: 0.1
34
+ CrossEntropyLoss.reduce: null
35
+ CrossEntropyLoss.reduction: mean
36
+ CrossEntropyLoss.size_average: null
37
+
38
+ NoamScheduler.d_model: 512
39
+ NoamScheduler.factor: 2.0
40
+ NoamScheduler.warmup: 500
41
+
42
+ VampNet.dropout: 0.1
43
+ VampNet.embedding_dim: 1280
44
+ VampNet.flash_attn: false
45
+ VampNet.latent_dim: 8
46
+ VampNet.n_codebooks: 4
47
+ VampNet.n_conditioning_codebooks: 0
48
+ VampNet.n_heads: 20
49
+ VampNet.n_layers: 20
50
+ VampNet.noise_mode: mask
51
+ VampNet.r_cond_dim: 0
52
+ VampNet.vocab_size: 1024
53
+
54
+ amp: false
55
+
56
+ args.debug: true
57
+ args.load: conf/generated/natural-sounds/coarse.yml
58
+ args.save: null
59
+
60
+ batch_size: 6
61
+
62
+ codec_ckpt: ./models/vampnet/codec.pth
63
+
64
+ fine_tune: true
65
+
66
+ fine_tune_checkpoint: ./models/vampnet/coarse.pth
67
+
68
+ grad_clip_val: 5.0
69
+
70
+ num_iters: 500000
71
+
72
+ num_workers: 7
73
+
74
+ resume: false
75
+
76
+ sample_freq: 2000
77
+
78
+ save_iters:
79
+ - 2000
80
+ - 4000
81
+ - 10000
82
+ - 20000
83
+ - 40000
84
+ - 100000
85
+
86
+ save_path: ./runs/soundrangers-v2/coarse
87
+
88
+ seed: 0
89
+
90
+ tag: latest
91
+
92
+ train/AudioDataset.aligned: false
93
+ train/AudioDataset.duration: 10.0
94
+ train/AudioDataset.loudness_cutoff: -30.0
95
+ train/AudioDataset.n_examples: 100000000
96
+ train/AudioDataset.num_channels: 1
97
+ train/AudioDataset.offset: null
98
+ train/AudioDataset.shuffle_loaders: false
99
+ train/AudioDataset.without_replacement: false
100
+
101
+ train/AudioLoader.sources:
102
+ - /media/CHONK2/prosound_redacted/Soundrangers Complete
103
+ - /media/CHONK2/prosound_redacted/Soundrangers Update 2018
104
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
105
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
106
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
107
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
108
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
109
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
110
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
111
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
112
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
113
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Bells
114
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
115
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
116
+
117
+ val/AudioDataset.aligned: false
118
+ val/AudioDataset.duration: 10.0
119
+ val/AudioDataset.loudness_cutoff: -30.0
120
+ val/AudioDataset.n_examples: 500
121
+ val/AudioDataset.num_channels: 1
122
+ val/AudioDataset.offset: null
123
+ val/AudioDataset.shuffle_loaders: false
124
+ val/AudioDataset.without_replacement: false
125
+
126
+ val/AudioLoader.sources:
127
+ - /media/CHONK2/prosound_redacted/Soundrangers Complete
128
+ - /media/CHONK2/prosound_redacted/Soundrangers Update 2018
129
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
130
+ - /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
131
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
132
+ - /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
133
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
134
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
135
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
136
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
137
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
138
+ - /media/CHONK2/prosound_redacted/Big Room Complete/Bells
139
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
140
+ - /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
141
+
142
+ val_freq: 1000
143
+
144
+ val_idx:
145
+ - 0
146
+ - 1
147
+ - 2
148
+ - 3
149
+ - 4
150
+ - 5
151
+ - 6
152
+ - 7
153
+ - 8
154
+ - 9
155
+
runs/soundrangers-v2/coarse/latest/vampnet/weights.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:961d04558e809c3828b92526e9141be051bb9195144a7d598341d60eef5db90f
3
+ size 1343718241
runs/soundrangers-v2/coarse/model.txt ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ OptimizedModule(
2
+ 335.894M params.
3
+ (_orig_mod): VampNet(
4
+ 335.894M params.
5
+ (embedding): CodebookEmbedding(
6
+ 0.042M params.
7
+ (special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
8
+ (out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
9
+ )
10
+ (transformer): TransformerStack(
11
+ 330.600M params.
12
+ (layers): ModuleList(
13
+ (0): TransformerLayer(
14
+ 16.531M params.
15
+ (norm_1): RMSNorm( 0.001M params.)
16
+ (film_1): FiLM( 0.000M params.)
17
+ (self_attn): MultiHeadRelativeAttention(
18
+ 6.616M params.
19
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
20
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
21
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
22
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
23
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
24
+ (relative_attention_bias): Embedding(32, 20 0.001M params.)
25
+ )
26
+ (norm_3): RMSNorm( 0.001M params.)
27
+ (film_3): FiLM( 0.000M params.)
28
+ (feed_forward): FeedForward(
29
+ 9.912M params.
30
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
31
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
32
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
33
+ (act): GatedGELU(
34
+ 0.000M params.
35
+ (gelu): NewGELU( 0.000M params.)
36
+ )
37
+ )
38
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
39
+ )
40
+ (1-19): 19 x TransformerLayer(
41
+ 16.530M params.
42
+ (norm_1): RMSNorm( 0.001M params.)
43
+ (film_1): FiLM( 0.000M params.)
44
+ (self_attn): MultiHeadRelativeAttention(
45
+ 6.615M params.
46
+ (w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
47
+ (w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
48
+ (w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
49
+ (fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
50
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
51
+ )
52
+ (norm_3): RMSNorm( 0.001M params.)
53
+ (film_3): FiLM( 0.000M params.)
54
+ (feed_forward): FeedForward(
55
+ 9.912M params.
56
+ (w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
57
+ (w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
58
+ (drop): Dropout(p=0.1, inplace=False 0.000M params.)
59
+ (act): GatedGELU(
60
+ 0.000M params.
61
+ (gelu): NewGELU( 0.000M params.)
62
+ )
63
+ )
64
+ (dropout): Dropout(p=0.1, inplace=False 0.000M params.)
65
+ )
66
+ )
67
+ (norm): RMSNorm( 0.001M params.)
68
+ )
69
+ (classifier): SequentialWithFiLM(
70
+ 5.251M params.
71
+ (layers): ModuleList(
72
+ (0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
73
+ )
74
+ )
75
+ )
76
+ )