Spaces:
Sleeping
Sleeping
Hugo Flores Garcia
commited on
Commit
•
4c17dbe
1
Parent(s):
9567041
models
Browse files- runs/boleros/c2f/args.yml +825 -0
- runs/boleros/c2f/latest/vampnet/weights.pth +3 -0
- runs/boleros/c2f/model.txt +76 -0
- runs/boleros/coarse/args.yml +825 -0
- runs/boleros/coarse/latest/vampnet/weights.pth +3 -0
- runs/boleros/coarse/model.txt +76 -0
- runs/choir/c2f/latest/vampnet/weights.pth +3 -0
- runs/choir/coarse/latest/vampnet/weights.pth +3 -0
- runs/knower/c2f/args.yml +824 -0
- runs/knower/c2f/best/vampnet/weights.pth +3 -0
- runs/knower/c2f/latest/vampnet/weights.pth +3 -0
- runs/knower/c2f/model.txt +76 -0
- runs/knower/coarse/args.yml +824 -0
- runs/knower/coarse/best/vampnet/weights.pth +3 -0
- runs/knower/coarse/latest/vampnet/weights.pth +3 -0
- runs/knower/coarse/model.txt +76 -0
- runs/n64/c2f/args.yml +129 -0
- runs/n64/c2f/latest/vampnet/weights.pth +3 -0
- runs/n64/c2f/model.txt +76 -0
- runs/n64/coarse/args.yml +129 -0
- runs/n64/coarse/latest/vampnet/weights.pth +3 -0
- runs/n64/coarse/model.txt +76 -0
- runs/n64/n64/c2f/vampnet/weights.pth +3 -0
- runs/n64/n64/coarse/latest/vampnet/weights.pth +3 -0
- runs/opera/coarse/latest/vampnet/weights.pth +3 -0
- runs/orchestral/c2f/args.yml +129 -0
- runs/orchestral/c2f/latest/vampnet/weights.pth +3 -0
- runs/orchestral/c2f/model.txt +76 -0
- runs/orchestral/coarse/args.yml +129 -0
- runs/orchestral/coarse/latest/vampnet/weights.pth +3 -0
- runs/orchestral/coarse/model.txt +76 -0
- runs/soundrangers-v2-v1/c2f/args.yml +851 -0
- runs/soundrangers-v2-v1/c2f/latest/vampnet/weights.pth +3 -0
- runs/soundrangers-v2-v1/c2f/model.txt +73 -0
- runs/soundrangers-v2-v1/coarse/args.yml +851 -0
- runs/soundrangers-v2-v1/coarse/latest/vampnet/weights.pth +3 -0
- runs/soundrangers-v2-v1/coarse/model.txt +73 -0
- runs/soundrangers-v2/c2f/args.yml +155 -0
- runs/soundrangers-v2/c2f/latest/vampnet/weights.pth +3 -0
- runs/soundrangers-v2/c2f/model.txt +76 -0
- runs/soundrangers-v2/coarse/args.yml +155 -0
- runs/soundrangers-v2/coarse/latest/vampnet/weights.pth +3 -0
- runs/soundrangers-v2/coarse/model.txt +76 -0
runs/boleros/c2f/args.yml
ADDED
@@ -0,0 +1,825 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 3.0
|
14 |
+
AudioDataset.loudness_cutoff: -40.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: ''
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
BackgroundNoise.eq_amount: !!python/tuple
|
33 |
+
- const
|
34 |
+
- 1.0
|
35 |
+
BackgroundNoise.loudness_cutoff: null
|
36 |
+
BackgroundNoise.n_bands: 3
|
37 |
+
BackgroundNoise.name: null
|
38 |
+
BackgroundNoise.prob: 1.0
|
39 |
+
BackgroundNoise.snr: !!python/tuple
|
40 |
+
- uniform
|
41 |
+
- 10.0
|
42 |
+
- 30.0
|
43 |
+
BackgroundNoise.sources: null
|
44 |
+
BackgroundNoise.weights: null
|
45 |
+
|
46 |
+
BaseTransform.keys: []
|
47 |
+
BaseTransform.name: null
|
48 |
+
BaseTransform.prob: 1.0
|
49 |
+
|
50 |
+
ClippingDistortion.name: null
|
51 |
+
ClippingDistortion.perc: !!python/tuple
|
52 |
+
- uniform
|
53 |
+
- 0.0
|
54 |
+
- 0.1
|
55 |
+
ClippingDistortion.prob: 1.0
|
56 |
+
|
57 |
+
CorruptPhase.name: null
|
58 |
+
CorruptPhase.prob: 1
|
59 |
+
CorruptPhase.scale: !!python/tuple
|
60 |
+
- uniform
|
61 |
+
- 0
|
62 |
+
- 3.141592653589793
|
63 |
+
|
64 |
+
CrossEntropyLoss.ignore_index: -100
|
65 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
66 |
+
CrossEntropyLoss.reduce: null
|
67 |
+
CrossEntropyLoss.reduction: mean
|
68 |
+
CrossEntropyLoss.size_average: null
|
69 |
+
|
70 |
+
CrossTalk.loudness_cutoff: -40
|
71 |
+
CrossTalk.name: null
|
72 |
+
CrossTalk.prob: 1.0
|
73 |
+
CrossTalk.snr: !!python/tuple
|
74 |
+
- uniform
|
75 |
+
- 0.0
|
76 |
+
- 10.0
|
77 |
+
CrossTalk.sources: null
|
78 |
+
CrossTalk.weights: null
|
79 |
+
|
80 |
+
Equalizer.eq_amount: !!python/tuple
|
81 |
+
- const
|
82 |
+
- 1.0
|
83 |
+
Equalizer.n_bands: 6
|
84 |
+
Equalizer.name: null
|
85 |
+
Equalizer.prob: 1.0
|
86 |
+
|
87 |
+
FrequencyMask.f_center: !!python/tuple
|
88 |
+
- uniform
|
89 |
+
- 0.0
|
90 |
+
- 1.0
|
91 |
+
FrequencyMask.f_width: !!python/tuple
|
92 |
+
- const
|
93 |
+
- 0.1
|
94 |
+
FrequencyMask.name: null
|
95 |
+
FrequencyMask.prob: 1
|
96 |
+
|
97 |
+
FrequencyNoise.f_center: !!python/tuple
|
98 |
+
- uniform
|
99 |
+
- 0.0
|
100 |
+
- 1.0
|
101 |
+
FrequencyNoise.f_width: !!python/tuple
|
102 |
+
- const
|
103 |
+
- 0.1
|
104 |
+
FrequencyNoise.name: null
|
105 |
+
FrequencyNoise.prob: 1
|
106 |
+
|
107 |
+
GlobalVolumeNorm.db: !!python/tuple
|
108 |
+
- const
|
109 |
+
- -24
|
110 |
+
GlobalVolumeNorm.name: null
|
111 |
+
GlobalVolumeNorm.prob: 1.0
|
112 |
+
|
113 |
+
HighPass.cutoff: !!python/tuple
|
114 |
+
- choice
|
115 |
+
- - 50
|
116 |
+
- 100
|
117 |
+
- 250
|
118 |
+
- 500
|
119 |
+
- 1000
|
120 |
+
HighPass.name: null
|
121 |
+
HighPass.prob: 1
|
122 |
+
HighPass.zeros: 51
|
123 |
+
|
124 |
+
InvertPhase.name: null
|
125 |
+
InvertPhase.prob: 1
|
126 |
+
|
127 |
+
LowPass.cutoff: !!python/tuple
|
128 |
+
- choice
|
129 |
+
- - 4000
|
130 |
+
- 8000
|
131 |
+
- 16000
|
132 |
+
LowPass.name: null
|
133 |
+
LowPass.prob: 1
|
134 |
+
LowPass.zeros: 51
|
135 |
+
|
136 |
+
MaskLowMagnitudes.db_cutoff: !!python/tuple
|
137 |
+
- uniform
|
138 |
+
- -10
|
139 |
+
- 10
|
140 |
+
MaskLowMagnitudes.name: null
|
141 |
+
MaskLowMagnitudes.prob: 1
|
142 |
+
|
143 |
+
MuLawQuantization.channels: !!python/tuple
|
144 |
+
- choice
|
145 |
+
- - 8
|
146 |
+
- 32
|
147 |
+
- 128
|
148 |
+
- 256
|
149 |
+
- 1024
|
150 |
+
MuLawQuantization.name: null
|
151 |
+
MuLawQuantization.prob: 1.0
|
152 |
+
|
153 |
+
NoamScheduler.d_model: 512
|
154 |
+
NoamScheduler.factor: 2.0
|
155 |
+
NoamScheduler.warmup: 500
|
156 |
+
|
157 |
+
NoiseFloor.db: !!python/tuple
|
158 |
+
- const
|
159 |
+
- -50.0
|
160 |
+
NoiseFloor.name: null
|
161 |
+
NoiseFloor.prob: 1.0
|
162 |
+
|
163 |
+
Quantization.channels: !!python/tuple
|
164 |
+
- choice
|
165 |
+
- - 8
|
166 |
+
- 32
|
167 |
+
- 128
|
168 |
+
- 256
|
169 |
+
- 1024
|
170 |
+
Quantization.name: null
|
171 |
+
Quantization.prob: 1.0
|
172 |
+
|
173 |
+
Repeat.n_repeat: 1
|
174 |
+
Repeat.name: null
|
175 |
+
Repeat.prob: 1.0
|
176 |
+
|
177 |
+
RepeatUpTo.max_repeat: 5
|
178 |
+
RepeatUpTo.name: null
|
179 |
+
RepeatUpTo.prob: 1.0
|
180 |
+
RepeatUpTo.weights: null
|
181 |
+
|
182 |
+
RescaleAudio.name: null
|
183 |
+
RescaleAudio.prob: 1
|
184 |
+
RescaleAudio.val: 1.0
|
185 |
+
|
186 |
+
RoomImpulseResponse.drr: !!python/tuple
|
187 |
+
- uniform
|
188 |
+
- 0.0
|
189 |
+
- 30.0
|
190 |
+
RoomImpulseResponse.duration: 1.0
|
191 |
+
RoomImpulseResponse.eq_amount: !!python/tuple
|
192 |
+
- const
|
193 |
+
- 1.0
|
194 |
+
RoomImpulseResponse.n_bands: 6
|
195 |
+
RoomImpulseResponse.name: null
|
196 |
+
RoomImpulseResponse.offset: 0.0
|
197 |
+
RoomImpulseResponse.prob: 1.0
|
198 |
+
RoomImpulseResponse.sources: null
|
199 |
+
RoomImpulseResponse.use_original_phase: false
|
200 |
+
RoomImpulseResponse.weights: null
|
201 |
+
|
202 |
+
ShiftPhase.name: null
|
203 |
+
ShiftPhase.prob: 1
|
204 |
+
ShiftPhase.shift: !!python/tuple
|
205 |
+
- uniform
|
206 |
+
- -3.141592653589793
|
207 |
+
- 3.141592653589793
|
208 |
+
|
209 |
+
Silence.name: null
|
210 |
+
Silence.prob: 0.1
|
211 |
+
|
212 |
+
Smoothing.name: null
|
213 |
+
Smoothing.prob: 1
|
214 |
+
Smoothing.window_length: !!python/tuple
|
215 |
+
- choice
|
216 |
+
- - 8
|
217 |
+
- 16
|
218 |
+
- 32
|
219 |
+
- 64
|
220 |
+
- 128
|
221 |
+
- 256
|
222 |
+
- 512
|
223 |
+
Smoothing.window_type: !!python/tuple
|
224 |
+
- const
|
225 |
+
- average
|
226 |
+
|
227 |
+
SpectralDenoising.denoise_amount: !!python/tuple
|
228 |
+
- uniform
|
229 |
+
- 0.8
|
230 |
+
- 1.0
|
231 |
+
SpectralDenoising.eq_amount: !!python/tuple
|
232 |
+
- const
|
233 |
+
- 1.0
|
234 |
+
SpectralDenoising.n_bands: 6
|
235 |
+
SpectralDenoising.n_freq: 3
|
236 |
+
SpectralDenoising.n_time: 5
|
237 |
+
SpectralDenoising.name: null
|
238 |
+
SpectralDenoising.nz_volume: -40
|
239 |
+
SpectralDenoising.prob: 1
|
240 |
+
|
241 |
+
TimeMask.name: null
|
242 |
+
TimeMask.prob: 1
|
243 |
+
TimeMask.t_center: !!python/tuple
|
244 |
+
- uniform
|
245 |
+
- 0.0
|
246 |
+
- 1.0
|
247 |
+
TimeMask.t_width: !!python/tuple
|
248 |
+
- const
|
249 |
+
- 0.025
|
250 |
+
|
251 |
+
TimeNoise.name: null
|
252 |
+
TimeNoise.prob: 1
|
253 |
+
TimeNoise.t_center: !!python/tuple
|
254 |
+
- uniform
|
255 |
+
- 0.0
|
256 |
+
- 1.0
|
257 |
+
TimeNoise.t_width: !!python/tuple
|
258 |
+
- const
|
259 |
+
- 0.025
|
260 |
+
|
261 |
+
VampNet.dropout: 0.1
|
262 |
+
VampNet.embedding_dim: 1280
|
263 |
+
VampNet.flash_attn: false
|
264 |
+
VampNet.latent_dim: 8
|
265 |
+
VampNet.n_codebooks: 14
|
266 |
+
VampNet.n_conditioning_codebooks: 4
|
267 |
+
VampNet.n_heads: 20
|
268 |
+
VampNet.n_layers: 16
|
269 |
+
VampNet.noise_mode: mask
|
270 |
+
VampNet.r_cond_dim: 0
|
271 |
+
VampNet.vocab_size: 1024
|
272 |
+
|
273 |
+
VolumeChange.db: !!python/tuple
|
274 |
+
- uniform
|
275 |
+
- -12.0
|
276 |
+
- 0.0
|
277 |
+
VolumeChange.name: null
|
278 |
+
VolumeChange.prob: 1.0
|
279 |
+
|
280 |
+
VolumeNorm.db: !!python/tuple
|
281 |
+
- const
|
282 |
+
- -24
|
283 |
+
VolumeNorm.name: null
|
284 |
+
VolumeNorm.prob: 1.0
|
285 |
+
|
286 |
+
amp: false
|
287 |
+
|
288 |
+
args.debug: true
|
289 |
+
args.load: conf/generated/boleros/c2f.yml
|
290 |
+
args.save: null
|
291 |
+
|
292 |
+
batch_size: 7
|
293 |
+
|
294 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
295 |
+
|
296 |
+
fine_tune: true
|
297 |
+
|
298 |
+
fine_tune_checkpoint: ./models/vampnet/c2f.pth
|
299 |
+
|
300 |
+
grad_clip_val: 5.0
|
301 |
+
|
302 |
+
num_iters: 500000
|
303 |
+
|
304 |
+
num_workers: 7
|
305 |
+
|
306 |
+
resume: false
|
307 |
+
|
308 |
+
sample_freq: 1000
|
309 |
+
|
310 |
+
save_iters:
|
311 |
+
- 10000
|
312 |
+
- 20000
|
313 |
+
- 30000
|
314 |
+
- 40000
|
315 |
+
- 50000
|
316 |
+
- 100000
|
317 |
+
|
318 |
+
save_path: ./runs/boleros/c2f
|
319 |
+
|
320 |
+
seed: 0
|
321 |
+
|
322 |
+
tag: latest
|
323 |
+
|
324 |
+
train/AudioDataset.aligned: false
|
325 |
+
train/AudioDataset.duration: 3.0
|
326 |
+
train/AudioDataset.loudness_cutoff: -40.0
|
327 |
+
train/AudioDataset.n_examples: 100000000
|
328 |
+
train/AudioDataset.num_channels: 1
|
329 |
+
train/AudioDataset.offset: null
|
330 |
+
train/AudioDataset.shuffle_loaders: false
|
331 |
+
train/AudioDataset.without_replacement: false
|
332 |
+
|
333 |
+
train/AudioLoader.sources:
|
334 |
+
- /media/CHONK/hugo/loras/boleros
|
335 |
+
|
336 |
+
train/BackgroundNoise.eq_amount: !!python/tuple
|
337 |
+
- const
|
338 |
+
- 1.0
|
339 |
+
train/BackgroundNoise.loudness_cutoff: null
|
340 |
+
train/BackgroundNoise.n_bands: 3
|
341 |
+
train/BackgroundNoise.name: null
|
342 |
+
train/BackgroundNoise.prob: 1.0
|
343 |
+
train/BackgroundNoise.snr: !!python/tuple
|
344 |
+
- uniform
|
345 |
+
- 10.0
|
346 |
+
- 30.0
|
347 |
+
train/BackgroundNoise.sources: null
|
348 |
+
train/BackgroundNoise.weights: null
|
349 |
+
|
350 |
+
train/BaseTransform.keys: []
|
351 |
+
train/BaseTransform.name: null
|
352 |
+
train/BaseTransform.prob: 1.0
|
353 |
+
|
354 |
+
train/ClippingDistortion.name: null
|
355 |
+
train/ClippingDistortion.perc: !!python/tuple
|
356 |
+
- uniform
|
357 |
+
- 0.0
|
358 |
+
- 0.1
|
359 |
+
train/ClippingDistortion.prob: 1.0
|
360 |
+
|
361 |
+
train/CorruptPhase.name: null
|
362 |
+
train/CorruptPhase.prob: 1
|
363 |
+
train/CorruptPhase.scale: !!python/tuple
|
364 |
+
- uniform
|
365 |
+
- 0
|
366 |
+
- 3.141592653589793
|
367 |
+
|
368 |
+
train/CrossTalk.loudness_cutoff: -40
|
369 |
+
train/CrossTalk.name: null
|
370 |
+
train/CrossTalk.prob: 1.0
|
371 |
+
train/CrossTalk.snr: !!python/tuple
|
372 |
+
- uniform
|
373 |
+
- 0.0
|
374 |
+
- 10.0
|
375 |
+
train/CrossTalk.sources: null
|
376 |
+
train/CrossTalk.weights: null
|
377 |
+
|
378 |
+
train/Equalizer.eq_amount: !!python/tuple
|
379 |
+
- const
|
380 |
+
- 1.0
|
381 |
+
train/Equalizer.n_bands: 6
|
382 |
+
train/Equalizer.name: null
|
383 |
+
train/Equalizer.prob: 1.0
|
384 |
+
|
385 |
+
train/FrequencyMask.f_center: !!python/tuple
|
386 |
+
- uniform
|
387 |
+
- 0.0
|
388 |
+
- 1.0
|
389 |
+
train/FrequencyMask.f_width: !!python/tuple
|
390 |
+
- const
|
391 |
+
- 0.1
|
392 |
+
train/FrequencyMask.name: null
|
393 |
+
train/FrequencyMask.prob: 1
|
394 |
+
|
395 |
+
train/FrequencyNoise.f_center: !!python/tuple
|
396 |
+
- uniform
|
397 |
+
- 0.0
|
398 |
+
- 1.0
|
399 |
+
train/FrequencyNoise.f_width: !!python/tuple
|
400 |
+
- const
|
401 |
+
- 0.1
|
402 |
+
train/FrequencyNoise.name: null
|
403 |
+
train/FrequencyNoise.prob: 1
|
404 |
+
|
405 |
+
train/GlobalVolumeNorm.db: !!python/tuple
|
406 |
+
- const
|
407 |
+
- -24
|
408 |
+
train/GlobalVolumeNorm.name: null
|
409 |
+
train/GlobalVolumeNorm.prob: 1.0
|
410 |
+
|
411 |
+
train/HighPass.cutoff: !!python/tuple
|
412 |
+
- choice
|
413 |
+
- - 50
|
414 |
+
- 100
|
415 |
+
- 250
|
416 |
+
- 500
|
417 |
+
- 1000
|
418 |
+
train/HighPass.name: null
|
419 |
+
train/HighPass.prob: 1
|
420 |
+
train/HighPass.zeros: 51
|
421 |
+
|
422 |
+
train/InvertPhase.name: null
|
423 |
+
train/InvertPhase.prob: 1
|
424 |
+
|
425 |
+
train/LowPass.cutoff: !!python/tuple
|
426 |
+
- choice
|
427 |
+
- - 4000
|
428 |
+
- 8000
|
429 |
+
- 16000
|
430 |
+
train/LowPass.name: null
|
431 |
+
train/LowPass.prob: 1
|
432 |
+
train/LowPass.zeros: 51
|
433 |
+
|
434 |
+
train/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
435 |
+
- uniform
|
436 |
+
- -10
|
437 |
+
- 10
|
438 |
+
train/MaskLowMagnitudes.name: null
|
439 |
+
train/MaskLowMagnitudes.prob: 1
|
440 |
+
|
441 |
+
train/MuLawQuantization.channels: !!python/tuple
|
442 |
+
- choice
|
443 |
+
- - 8
|
444 |
+
- 32
|
445 |
+
- 128
|
446 |
+
- 256
|
447 |
+
- 1024
|
448 |
+
train/MuLawQuantization.name: null
|
449 |
+
train/MuLawQuantization.prob: 1.0
|
450 |
+
|
451 |
+
train/NoiseFloor.db: !!python/tuple
|
452 |
+
- const
|
453 |
+
- -50.0
|
454 |
+
train/NoiseFloor.name: null
|
455 |
+
train/NoiseFloor.prob: 1.0
|
456 |
+
|
457 |
+
train/Quantization.channels: !!python/tuple
|
458 |
+
- choice
|
459 |
+
- - 8
|
460 |
+
- 32
|
461 |
+
- 128
|
462 |
+
- 256
|
463 |
+
- 1024
|
464 |
+
train/Quantization.name: null
|
465 |
+
train/Quantization.prob: 1.0
|
466 |
+
|
467 |
+
train/Repeat.n_repeat: 1
|
468 |
+
train/Repeat.name: null
|
469 |
+
train/Repeat.prob: 1.0
|
470 |
+
|
471 |
+
train/RepeatUpTo.max_repeat: 5
|
472 |
+
train/RepeatUpTo.name: null
|
473 |
+
train/RepeatUpTo.prob: 1.0
|
474 |
+
train/RepeatUpTo.weights: null
|
475 |
+
|
476 |
+
train/RescaleAudio.name: null
|
477 |
+
train/RescaleAudio.prob: 1
|
478 |
+
train/RescaleAudio.val: 1.0
|
479 |
+
|
480 |
+
train/RoomImpulseResponse.drr: !!python/tuple
|
481 |
+
- uniform
|
482 |
+
- 0.0
|
483 |
+
- 30.0
|
484 |
+
train/RoomImpulseResponse.duration: 1.0
|
485 |
+
train/RoomImpulseResponse.eq_amount: !!python/tuple
|
486 |
+
- const
|
487 |
+
- 1.0
|
488 |
+
train/RoomImpulseResponse.n_bands: 6
|
489 |
+
train/RoomImpulseResponse.name: null
|
490 |
+
train/RoomImpulseResponse.offset: 0.0
|
491 |
+
train/RoomImpulseResponse.prob: 1.0
|
492 |
+
train/RoomImpulseResponse.sources: null
|
493 |
+
train/RoomImpulseResponse.use_original_phase: false
|
494 |
+
train/RoomImpulseResponse.weights: null
|
495 |
+
|
496 |
+
train/ShiftPhase.name: null
|
497 |
+
train/ShiftPhase.prob: 1
|
498 |
+
train/ShiftPhase.shift: !!python/tuple
|
499 |
+
- uniform
|
500 |
+
- -3.141592653589793
|
501 |
+
- 3.141592653589793
|
502 |
+
|
503 |
+
train/Silence.name: null
|
504 |
+
train/Silence.prob: 0.1
|
505 |
+
|
506 |
+
train/Smoothing.name: null
|
507 |
+
train/Smoothing.prob: 1
|
508 |
+
train/Smoothing.window_length: !!python/tuple
|
509 |
+
- choice
|
510 |
+
- - 8
|
511 |
+
- 16
|
512 |
+
- 32
|
513 |
+
- 64
|
514 |
+
- 128
|
515 |
+
- 256
|
516 |
+
- 512
|
517 |
+
train/Smoothing.window_type: !!python/tuple
|
518 |
+
- const
|
519 |
+
- average
|
520 |
+
|
521 |
+
train/SpectralDenoising.denoise_amount: !!python/tuple
|
522 |
+
- uniform
|
523 |
+
- 0.8
|
524 |
+
- 1.0
|
525 |
+
train/SpectralDenoising.eq_amount: !!python/tuple
|
526 |
+
- const
|
527 |
+
- 1.0
|
528 |
+
train/SpectralDenoising.n_bands: 6
|
529 |
+
train/SpectralDenoising.n_freq: 3
|
530 |
+
train/SpectralDenoising.n_time: 5
|
531 |
+
train/SpectralDenoising.name: null
|
532 |
+
train/SpectralDenoising.nz_volume: -40
|
533 |
+
train/SpectralDenoising.prob: 1
|
534 |
+
|
535 |
+
train/TimeMask.name: null
|
536 |
+
train/TimeMask.prob: 1
|
537 |
+
train/TimeMask.t_center: !!python/tuple
|
538 |
+
- uniform
|
539 |
+
- 0.0
|
540 |
+
- 1.0
|
541 |
+
train/TimeMask.t_width: !!python/tuple
|
542 |
+
- const
|
543 |
+
- 0.025
|
544 |
+
|
545 |
+
train/TimeNoise.name: null
|
546 |
+
train/TimeNoise.prob: 1
|
547 |
+
train/TimeNoise.t_center: !!python/tuple
|
548 |
+
- uniform
|
549 |
+
- 0.0
|
550 |
+
- 1.0
|
551 |
+
train/TimeNoise.t_width: !!python/tuple
|
552 |
+
- const
|
553 |
+
- 0.025
|
554 |
+
|
555 |
+
train/VolumeChange.db: !!python/tuple
|
556 |
+
- uniform
|
557 |
+
- -12.0
|
558 |
+
- 0.0
|
559 |
+
train/VolumeChange.name: null
|
560 |
+
train/VolumeChange.prob: 1.0
|
561 |
+
|
562 |
+
train/VolumeNorm.db: !!python/tuple
|
563 |
+
- const
|
564 |
+
- -24
|
565 |
+
train/VolumeNorm.name: null
|
566 |
+
train/VolumeNorm.prob: 1.0
|
567 |
+
|
568 |
+
val/AudioDataset.aligned: false
|
569 |
+
val/AudioDataset.duration: 3.0
|
570 |
+
val/AudioDataset.loudness_cutoff: -40.0
|
571 |
+
val/AudioDataset.n_examples: 500
|
572 |
+
val/AudioDataset.num_channels: 1
|
573 |
+
val/AudioDataset.offset: null
|
574 |
+
val/AudioDataset.shuffle_loaders: false
|
575 |
+
val/AudioDataset.without_replacement: false
|
576 |
+
|
577 |
+
val/AudioLoader.sources:
|
578 |
+
- /media/CHONK/hugo/loras/boleros
|
579 |
+
|
580 |
+
val/BackgroundNoise.eq_amount: !!python/tuple
|
581 |
+
- const
|
582 |
+
- 1.0
|
583 |
+
val/BackgroundNoise.loudness_cutoff: null
|
584 |
+
val/BackgroundNoise.n_bands: 3
|
585 |
+
val/BackgroundNoise.name: null
|
586 |
+
val/BackgroundNoise.prob: 1.0
|
587 |
+
val/BackgroundNoise.snr: !!python/tuple
|
588 |
+
- uniform
|
589 |
+
- 10.0
|
590 |
+
- 30.0
|
591 |
+
val/BackgroundNoise.sources: null
|
592 |
+
val/BackgroundNoise.weights: null
|
593 |
+
|
594 |
+
val/BaseTransform.keys: []
|
595 |
+
val/BaseTransform.name: null
|
596 |
+
val/BaseTransform.prob: 1.0
|
597 |
+
|
598 |
+
val/ClippingDistortion.name: null
|
599 |
+
val/ClippingDistortion.perc: !!python/tuple
|
600 |
+
- uniform
|
601 |
+
- 0.0
|
602 |
+
- 0.1
|
603 |
+
val/ClippingDistortion.prob: 1.0
|
604 |
+
|
605 |
+
val/CorruptPhase.name: null
|
606 |
+
val/CorruptPhase.prob: 1
|
607 |
+
val/CorruptPhase.scale: !!python/tuple
|
608 |
+
- uniform
|
609 |
+
- 0
|
610 |
+
- 3.141592653589793
|
611 |
+
|
612 |
+
val/CrossTalk.loudness_cutoff: -40
|
613 |
+
val/CrossTalk.name: null
|
614 |
+
val/CrossTalk.prob: 1.0
|
615 |
+
val/CrossTalk.snr: !!python/tuple
|
616 |
+
- uniform
|
617 |
+
- 0.0
|
618 |
+
- 10.0
|
619 |
+
val/CrossTalk.sources: null
|
620 |
+
val/CrossTalk.weights: null
|
621 |
+
|
622 |
+
val/Equalizer.eq_amount: !!python/tuple
|
623 |
+
- const
|
624 |
+
- 1.0
|
625 |
+
val/Equalizer.n_bands: 6
|
626 |
+
val/Equalizer.name: null
|
627 |
+
val/Equalizer.prob: 1.0
|
628 |
+
|
629 |
+
val/FrequencyMask.f_center: !!python/tuple
|
630 |
+
- uniform
|
631 |
+
- 0.0
|
632 |
+
- 1.0
|
633 |
+
val/FrequencyMask.f_width: !!python/tuple
|
634 |
+
- const
|
635 |
+
- 0.1
|
636 |
+
val/FrequencyMask.name: null
|
637 |
+
val/FrequencyMask.prob: 1
|
638 |
+
|
639 |
+
val/FrequencyNoise.f_center: !!python/tuple
|
640 |
+
- uniform
|
641 |
+
- 0.0
|
642 |
+
- 1.0
|
643 |
+
val/FrequencyNoise.f_width: !!python/tuple
|
644 |
+
- const
|
645 |
+
- 0.1
|
646 |
+
val/FrequencyNoise.name: null
|
647 |
+
val/FrequencyNoise.prob: 1
|
648 |
+
|
649 |
+
val/GlobalVolumeNorm.db: !!python/tuple
|
650 |
+
- const
|
651 |
+
- -24
|
652 |
+
val/GlobalVolumeNorm.name: null
|
653 |
+
val/GlobalVolumeNorm.prob: 1.0
|
654 |
+
|
655 |
+
val/HighPass.cutoff: !!python/tuple
|
656 |
+
- choice
|
657 |
+
- - 50
|
658 |
+
- 100
|
659 |
+
- 250
|
660 |
+
- 500
|
661 |
+
- 1000
|
662 |
+
val/HighPass.name: null
|
663 |
+
val/HighPass.prob: 1
|
664 |
+
val/HighPass.zeros: 51
|
665 |
+
|
666 |
+
val/InvertPhase.name: null
|
667 |
+
val/InvertPhase.prob: 1
|
668 |
+
|
669 |
+
val/LowPass.cutoff: !!python/tuple
|
670 |
+
- choice
|
671 |
+
- - 4000
|
672 |
+
- 8000
|
673 |
+
- 16000
|
674 |
+
val/LowPass.name: null
|
675 |
+
val/LowPass.prob: 1
|
676 |
+
val/LowPass.zeros: 51
|
677 |
+
|
678 |
+
val/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
679 |
+
- uniform
|
680 |
+
- -10
|
681 |
+
- 10
|
682 |
+
val/MaskLowMagnitudes.name: null
|
683 |
+
val/MaskLowMagnitudes.prob: 1
|
684 |
+
|
685 |
+
val/MuLawQuantization.channels: !!python/tuple
|
686 |
+
- choice
|
687 |
+
- - 8
|
688 |
+
- 32
|
689 |
+
- 128
|
690 |
+
- 256
|
691 |
+
- 1024
|
692 |
+
val/MuLawQuantization.name: null
|
693 |
+
val/MuLawQuantization.prob: 1.0
|
694 |
+
|
695 |
+
val/NoiseFloor.db: !!python/tuple
|
696 |
+
- const
|
697 |
+
- -50.0
|
698 |
+
val/NoiseFloor.name: null
|
699 |
+
val/NoiseFloor.prob: 1.0
|
700 |
+
|
701 |
+
val/Quantization.channels: !!python/tuple
|
702 |
+
- choice
|
703 |
+
- - 8
|
704 |
+
- 32
|
705 |
+
- 128
|
706 |
+
- 256
|
707 |
+
- 1024
|
708 |
+
val/Quantization.name: null
|
709 |
+
val/Quantization.prob: 1.0
|
710 |
+
|
711 |
+
val/Repeat.n_repeat: 1
|
712 |
+
val/Repeat.name: null
|
713 |
+
val/Repeat.prob: 1.0
|
714 |
+
|
715 |
+
val/RepeatUpTo.max_repeat: 5
|
716 |
+
val/RepeatUpTo.name: null
|
717 |
+
val/RepeatUpTo.prob: 1.0
|
718 |
+
val/RepeatUpTo.weights: null
|
719 |
+
|
720 |
+
val/RescaleAudio.name: null
|
721 |
+
val/RescaleAudio.prob: 1
|
722 |
+
val/RescaleAudio.val: 1.0
|
723 |
+
|
724 |
+
val/RoomImpulseResponse.drr: !!python/tuple
|
725 |
+
- uniform
|
726 |
+
- 0.0
|
727 |
+
- 30.0
|
728 |
+
val/RoomImpulseResponse.duration: 1.0
|
729 |
+
val/RoomImpulseResponse.eq_amount: !!python/tuple
|
730 |
+
- const
|
731 |
+
- 1.0
|
732 |
+
val/RoomImpulseResponse.n_bands: 6
|
733 |
+
val/RoomImpulseResponse.name: null
|
734 |
+
val/RoomImpulseResponse.offset: 0.0
|
735 |
+
val/RoomImpulseResponse.prob: 1.0
|
736 |
+
val/RoomImpulseResponse.sources: null
|
737 |
+
val/RoomImpulseResponse.use_original_phase: false
|
738 |
+
val/RoomImpulseResponse.weights: null
|
739 |
+
|
740 |
+
val/ShiftPhase.name: null
|
741 |
+
val/ShiftPhase.prob: 1
|
742 |
+
val/ShiftPhase.shift: !!python/tuple
|
743 |
+
- uniform
|
744 |
+
- -3.141592653589793
|
745 |
+
- 3.141592653589793
|
746 |
+
|
747 |
+
val/Silence.name: null
|
748 |
+
val/Silence.prob: 0.1
|
749 |
+
|
750 |
+
val/Smoothing.name: null
|
751 |
+
val/Smoothing.prob: 1
|
752 |
+
val/Smoothing.window_length: !!python/tuple
|
753 |
+
- choice
|
754 |
+
- - 8
|
755 |
+
- 16
|
756 |
+
- 32
|
757 |
+
- 64
|
758 |
+
- 128
|
759 |
+
- 256
|
760 |
+
- 512
|
761 |
+
val/Smoothing.window_type: !!python/tuple
|
762 |
+
- const
|
763 |
+
- average
|
764 |
+
|
765 |
+
val/SpectralDenoising.denoise_amount: !!python/tuple
|
766 |
+
- uniform
|
767 |
+
- 0.8
|
768 |
+
- 1.0
|
769 |
+
val/SpectralDenoising.eq_amount: !!python/tuple
|
770 |
+
- const
|
771 |
+
- 1.0
|
772 |
+
val/SpectralDenoising.n_bands: 6
|
773 |
+
val/SpectralDenoising.n_freq: 3
|
774 |
+
val/SpectralDenoising.n_time: 5
|
775 |
+
val/SpectralDenoising.name: null
|
776 |
+
val/SpectralDenoising.nz_volume: -40
|
777 |
+
val/SpectralDenoising.prob: 1
|
778 |
+
|
779 |
+
val/TimeMask.name: null
|
780 |
+
val/TimeMask.prob: 1
|
781 |
+
val/TimeMask.t_center: !!python/tuple
|
782 |
+
- uniform
|
783 |
+
- 0.0
|
784 |
+
- 1.0
|
785 |
+
val/TimeMask.t_width: !!python/tuple
|
786 |
+
- const
|
787 |
+
- 0.025
|
788 |
+
|
789 |
+
val/TimeNoise.name: null
|
790 |
+
val/TimeNoise.prob: 1
|
791 |
+
val/TimeNoise.t_center: !!python/tuple
|
792 |
+
- uniform
|
793 |
+
- 0.0
|
794 |
+
- 1.0
|
795 |
+
val/TimeNoise.t_width: !!python/tuple
|
796 |
+
- const
|
797 |
+
- 0.025
|
798 |
+
|
799 |
+
val/VolumeChange.db: !!python/tuple
|
800 |
+
- uniform
|
801 |
+
- -12.0
|
802 |
+
- 0.0
|
803 |
+
val/VolumeChange.name: null
|
804 |
+
val/VolumeChange.prob: 1.0
|
805 |
+
|
806 |
+
val/VolumeNorm.db: !!python/tuple
|
807 |
+
- const
|
808 |
+
- -24
|
809 |
+
val/VolumeNorm.name: null
|
810 |
+
val/VolumeNorm.prob: 1.0
|
811 |
+
|
812 |
+
val_freq: 500
|
813 |
+
|
814 |
+
val_idx:
|
815 |
+
- 0
|
816 |
+
- 1
|
817 |
+
- 2
|
818 |
+
- 3
|
819 |
+
- 4
|
820 |
+
- 5
|
821 |
+
- 6
|
822 |
+
- 7
|
823 |
+
- 8
|
824 |
+
- 9
|
825 |
+
|
runs/boleros/c2f/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8408ab94ce858360744e6c7f8fe708e48926fd26f5021c8d13506d529e12ac68
|
3 |
+
size 1111127537
|
runs/boleros/c2f/model.txt
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
OptimizedModule(
|
2 |
+
277.753M params.
|
3 |
+
(_orig_mod): VampNet(
|
4 |
+
277.753M params.
|
5 |
+
(embedding): CodebookEmbedding(
|
6 |
+
0.145M params.
|
7 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
|
8 |
+
(out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
|
9 |
+
)
|
10 |
+
(transformer): TransformerStack(
|
11 |
+
264.481M params.
|
12 |
+
(layers): ModuleList(
|
13 |
+
(0): TransformerLayer(
|
14 |
+
16.531M params.
|
15 |
+
(norm_1): RMSNorm( 0.001M params.)
|
16 |
+
(film_1): FiLM( 0.000M params.)
|
17 |
+
(self_attn): MultiHeadRelativeAttention(
|
18 |
+
6.616M params.
|
19 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
21 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
22 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
23 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
24 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
25 |
+
)
|
26 |
+
(norm_3): RMSNorm( 0.001M params.)
|
27 |
+
(film_3): FiLM( 0.000M params.)
|
28 |
+
(feed_forward): FeedForward(
|
29 |
+
9.912M params.
|
30 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
31 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
32 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
33 |
+
(act): GatedGELU(
|
34 |
+
0.000M params.
|
35 |
+
(gelu): NewGELU( 0.000M params.)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
39 |
+
)
|
40 |
+
(1-15): 15 x TransformerLayer(
|
41 |
+
16.530M params.
|
42 |
+
(norm_1): RMSNorm( 0.001M params.)
|
43 |
+
(film_1): FiLM( 0.000M params.)
|
44 |
+
(self_attn): MultiHeadRelativeAttention(
|
45 |
+
6.615M params.
|
46 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
48 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
49 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
50 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
51 |
+
)
|
52 |
+
(norm_3): RMSNorm( 0.001M params.)
|
53 |
+
(film_3): FiLM( 0.000M params.)
|
54 |
+
(feed_forward): FeedForward(
|
55 |
+
9.912M params.
|
56 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
57 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
58 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
59 |
+
(act): GatedGELU(
|
60 |
+
0.000M params.
|
61 |
+
(gelu): NewGELU( 0.000M params.)
|
62 |
+
)
|
63 |
+
)
|
64 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
65 |
+
)
|
66 |
+
)
|
67 |
+
(norm): RMSNorm( 0.001M params.)
|
68 |
+
)
|
69 |
+
(classifier): SequentialWithFiLM(
|
70 |
+
13.128M params.
|
71 |
+
(layers): ModuleList(
|
72 |
+
(0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
|
73 |
+
)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
)
|
runs/boleros/coarse/args.yml
ADDED
@@ -0,0 +1,825 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 10.0
|
14 |
+
AudioDataset.loudness_cutoff: -30.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: ''
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
BackgroundNoise.eq_amount: !!python/tuple
|
33 |
+
- const
|
34 |
+
- 1.0
|
35 |
+
BackgroundNoise.loudness_cutoff: null
|
36 |
+
BackgroundNoise.n_bands: 3
|
37 |
+
BackgroundNoise.name: null
|
38 |
+
BackgroundNoise.prob: 1.0
|
39 |
+
BackgroundNoise.snr: !!python/tuple
|
40 |
+
- uniform
|
41 |
+
- 10.0
|
42 |
+
- 30.0
|
43 |
+
BackgroundNoise.sources: null
|
44 |
+
BackgroundNoise.weights: null
|
45 |
+
|
46 |
+
BaseTransform.keys: []
|
47 |
+
BaseTransform.name: null
|
48 |
+
BaseTransform.prob: 1.0
|
49 |
+
|
50 |
+
ClippingDistortion.name: null
|
51 |
+
ClippingDistortion.perc: !!python/tuple
|
52 |
+
- uniform
|
53 |
+
- 0.0
|
54 |
+
- 0.1
|
55 |
+
ClippingDistortion.prob: 1.0
|
56 |
+
|
57 |
+
CorruptPhase.name: null
|
58 |
+
CorruptPhase.prob: 1
|
59 |
+
CorruptPhase.scale: !!python/tuple
|
60 |
+
- uniform
|
61 |
+
- 0
|
62 |
+
- 3.141592653589793
|
63 |
+
|
64 |
+
CrossEntropyLoss.ignore_index: -100
|
65 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
66 |
+
CrossEntropyLoss.reduce: null
|
67 |
+
CrossEntropyLoss.reduction: mean
|
68 |
+
CrossEntropyLoss.size_average: null
|
69 |
+
|
70 |
+
CrossTalk.loudness_cutoff: -40
|
71 |
+
CrossTalk.name: null
|
72 |
+
CrossTalk.prob: 1.0
|
73 |
+
CrossTalk.snr: !!python/tuple
|
74 |
+
- uniform
|
75 |
+
- 0.0
|
76 |
+
- 10.0
|
77 |
+
CrossTalk.sources: null
|
78 |
+
CrossTalk.weights: null
|
79 |
+
|
80 |
+
Equalizer.eq_amount: !!python/tuple
|
81 |
+
- const
|
82 |
+
- 1.0
|
83 |
+
Equalizer.n_bands: 6
|
84 |
+
Equalizer.name: null
|
85 |
+
Equalizer.prob: 1.0
|
86 |
+
|
87 |
+
FrequencyMask.f_center: !!python/tuple
|
88 |
+
- uniform
|
89 |
+
- 0.0
|
90 |
+
- 1.0
|
91 |
+
FrequencyMask.f_width: !!python/tuple
|
92 |
+
- const
|
93 |
+
- 0.1
|
94 |
+
FrequencyMask.name: null
|
95 |
+
FrequencyMask.prob: 1
|
96 |
+
|
97 |
+
FrequencyNoise.f_center: !!python/tuple
|
98 |
+
- uniform
|
99 |
+
- 0.0
|
100 |
+
- 1.0
|
101 |
+
FrequencyNoise.f_width: !!python/tuple
|
102 |
+
- const
|
103 |
+
- 0.1
|
104 |
+
FrequencyNoise.name: null
|
105 |
+
FrequencyNoise.prob: 1
|
106 |
+
|
107 |
+
GlobalVolumeNorm.db: !!python/tuple
|
108 |
+
- const
|
109 |
+
- -24
|
110 |
+
GlobalVolumeNorm.name: null
|
111 |
+
GlobalVolumeNorm.prob: 1.0
|
112 |
+
|
113 |
+
HighPass.cutoff: !!python/tuple
|
114 |
+
- choice
|
115 |
+
- - 50
|
116 |
+
- 100
|
117 |
+
- 250
|
118 |
+
- 500
|
119 |
+
- 1000
|
120 |
+
HighPass.name: null
|
121 |
+
HighPass.prob: 1
|
122 |
+
HighPass.zeros: 51
|
123 |
+
|
124 |
+
InvertPhase.name: null
|
125 |
+
InvertPhase.prob: 1
|
126 |
+
|
127 |
+
LowPass.cutoff: !!python/tuple
|
128 |
+
- choice
|
129 |
+
- - 4000
|
130 |
+
- 8000
|
131 |
+
- 16000
|
132 |
+
LowPass.name: null
|
133 |
+
LowPass.prob: 1
|
134 |
+
LowPass.zeros: 51
|
135 |
+
|
136 |
+
MaskLowMagnitudes.db_cutoff: !!python/tuple
|
137 |
+
- uniform
|
138 |
+
- -10
|
139 |
+
- 10
|
140 |
+
MaskLowMagnitudes.name: null
|
141 |
+
MaskLowMagnitudes.prob: 1
|
142 |
+
|
143 |
+
MuLawQuantization.channels: !!python/tuple
|
144 |
+
- choice
|
145 |
+
- - 8
|
146 |
+
- 32
|
147 |
+
- 128
|
148 |
+
- 256
|
149 |
+
- 1024
|
150 |
+
MuLawQuantization.name: null
|
151 |
+
MuLawQuantization.prob: 1.0
|
152 |
+
|
153 |
+
NoamScheduler.d_model: 512
|
154 |
+
NoamScheduler.factor: 2.0
|
155 |
+
NoamScheduler.warmup: 500
|
156 |
+
|
157 |
+
NoiseFloor.db: !!python/tuple
|
158 |
+
- const
|
159 |
+
- -50.0
|
160 |
+
NoiseFloor.name: null
|
161 |
+
NoiseFloor.prob: 1.0
|
162 |
+
|
163 |
+
Quantization.channels: !!python/tuple
|
164 |
+
- choice
|
165 |
+
- - 8
|
166 |
+
- 32
|
167 |
+
- 128
|
168 |
+
- 256
|
169 |
+
- 1024
|
170 |
+
Quantization.name: null
|
171 |
+
Quantization.prob: 1.0
|
172 |
+
|
173 |
+
Repeat.n_repeat: 1
|
174 |
+
Repeat.name: null
|
175 |
+
Repeat.prob: 1.0
|
176 |
+
|
177 |
+
RepeatUpTo.max_repeat: 5
|
178 |
+
RepeatUpTo.name: null
|
179 |
+
RepeatUpTo.prob: 1.0
|
180 |
+
RepeatUpTo.weights: null
|
181 |
+
|
182 |
+
RescaleAudio.name: null
|
183 |
+
RescaleAudio.prob: 1
|
184 |
+
RescaleAudio.val: 1.0
|
185 |
+
|
186 |
+
RoomImpulseResponse.drr: !!python/tuple
|
187 |
+
- uniform
|
188 |
+
- 0.0
|
189 |
+
- 30.0
|
190 |
+
RoomImpulseResponse.duration: 1.0
|
191 |
+
RoomImpulseResponse.eq_amount: !!python/tuple
|
192 |
+
- const
|
193 |
+
- 1.0
|
194 |
+
RoomImpulseResponse.n_bands: 6
|
195 |
+
RoomImpulseResponse.name: null
|
196 |
+
RoomImpulseResponse.offset: 0.0
|
197 |
+
RoomImpulseResponse.prob: 1.0
|
198 |
+
RoomImpulseResponse.sources: null
|
199 |
+
RoomImpulseResponse.use_original_phase: false
|
200 |
+
RoomImpulseResponse.weights: null
|
201 |
+
|
202 |
+
ShiftPhase.name: null
|
203 |
+
ShiftPhase.prob: 1
|
204 |
+
ShiftPhase.shift: !!python/tuple
|
205 |
+
- uniform
|
206 |
+
- -3.141592653589793
|
207 |
+
- 3.141592653589793
|
208 |
+
|
209 |
+
Silence.name: null
|
210 |
+
Silence.prob: 0.1
|
211 |
+
|
212 |
+
Smoothing.name: null
|
213 |
+
Smoothing.prob: 1
|
214 |
+
Smoothing.window_length: !!python/tuple
|
215 |
+
- choice
|
216 |
+
- - 8
|
217 |
+
- 16
|
218 |
+
- 32
|
219 |
+
- 64
|
220 |
+
- 128
|
221 |
+
- 256
|
222 |
+
- 512
|
223 |
+
Smoothing.window_type: !!python/tuple
|
224 |
+
- const
|
225 |
+
- average
|
226 |
+
|
227 |
+
SpectralDenoising.denoise_amount: !!python/tuple
|
228 |
+
- uniform
|
229 |
+
- 0.8
|
230 |
+
- 1.0
|
231 |
+
SpectralDenoising.eq_amount: !!python/tuple
|
232 |
+
- const
|
233 |
+
- 1.0
|
234 |
+
SpectralDenoising.n_bands: 6
|
235 |
+
SpectralDenoising.n_freq: 3
|
236 |
+
SpectralDenoising.n_time: 5
|
237 |
+
SpectralDenoising.name: null
|
238 |
+
SpectralDenoising.nz_volume: -40
|
239 |
+
SpectralDenoising.prob: 1
|
240 |
+
|
241 |
+
TimeMask.name: null
|
242 |
+
TimeMask.prob: 1
|
243 |
+
TimeMask.t_center: !!python/tuple
|
244 |
+
- uniform
|
245 |
+
- 0.0
|
246 |
+
- 1.0
|
247 |
+
TimeMask.t_width: !!python/tuple
|
248 |
+
- const
|
249 |
+
- 0.025
|
250 |
+
|
251 |
+
TimeNoise.name: null
|
252 |
+
TimeNoise.prob: 1
|
253 |
+
TimeNoise.t_center: !!python/tuple
|
254 |
+
- uniform
|
255 |
+
- 0.0
|
256 |
+
- 1.0
|
257 |
+
TimeNoise.t_width: !!python/tuple
|
258 |
+
- const
|
259 |
+
- 0.025
|
260 |
+
|
261 |
+
VampNet.dropout: 0.1
|
262 |
+
VampNet.embedding_dim: 1280
|
263 |
+
VampNet.flash_attn: false
|
264 |
+
VampNet.latent_dim: 8
|
265 |
+
VampNet.n_codebooks: 4
|
266 |
+
VampNet.n_conditioning_codebooks: 0
|
267 |
+
VampNet.n_heads: 20
|
268 |
+
VampNet.n_layers: 20
|
269 |
+
VampNet.noise_mode: mask
|
270 |
+
VampNet.r_cond_dim: 0
|
271 |
+
VampNet.vocab_size: 1024
|
272 |
+
|
273 |
+
VolumeChange.db: !!python/tuple
|
274 |
+
- uniform
|
275 |
+
- -12.0
|
276 |
+
- 0.0
|
277 |
+
VolumeChange.name: null
|
278 |
+
VolumeChange.prob: 1.0
|
279 |
+
|
280 |
+
VolumeNorm.db: !!python/tuple
|
281 |
+
- const
|
282 |
+
- -24
|
283 |
+
VolumeNorm.name: null
|
284 |
+
VolumeNorm.prob: 1.0
|
285 |
+
|
286 |
+
amp: false
|
287 |
+
|
288 |
+
args.debug: true
|
289 |
+
args.load: conf/generated/boleros/coarse.yml
|
290 |
+
args.save: null
|
291 |
+
|
292 |
+
batch_size: 6
|
293 |
+
|
294 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
295 |
+
|
296 |
+
fine_tune: true
|
297 |
+
|
298 |
+
fine_tune_checkpoint: ./models/vampnet/coarse.pth
|
299 |
+
|
300 |
+
grad_clip_val: 5.0
|
301 |
+
|
302 |
+
num_iters: 500000
|
303 |
+
|
304 |
+
num_workers: 7
|
305 |
+
|
306 |
+
resume: false
|
307 |
+
|
308 |
+
sample_freq: 1000
|
309 |
+
|
310 |
+
save_iters:
|
311 |
+
- 10000
|
312 |
+
- 20000
|
313 |
+
- 30000
|
314 |
+
- 40000
|
315 |
+
- 50000
|
316 |
+
- 100000
|
317 |
+
|
318 |
+
save_path: ./runs/boleros/coarse
|
319 |
+
|
320 |
+
seed: 0
|
321 |
+
|
322 |
+
tag: latest
|
323 |
+
|
324 |
+
train/AudioDataset.aligned: false
|
325 |
+
train/AudioDataset.duration: 10.0
|
326 |
+
train/AudioDataset.loudness_cutoff: -30.0
|
327 |
+
train/AudioDataset.n_examples: 100000000
|
328 |
+
train/AudioDataset.num_channels: 1
|
329 |
+
train/AudioDataset.offset: null
|
330 |
+
train/AudioDataset.shuffle_loaders: false
|
331 |
+
train/AudioDataset.without_replacement: false
|
332 |
+
|
333 |
+
train/AudioLoader.sources:
|
334 |
+
- /media/CHONK/hugo/loras/boleros
|
335 |
+
|
336 |
+
train/BackgroundNoise.eq_amount: !!python/tuple
|
337 |
+
- const
|
338 |
+
- 1.0
|
339 |
+
train/BackgroundNoise.loudness_cutoff: null
|
340 |
+
train/BackgroundNoise.n_bands: 3
|
341 |
+
train/BackgroundNoise.name: null
|
342 |
+
train/BackgroundNoise.prob: 1.0
|
343 |
+
train/BackgroundNoise.snr: !!python/tuple
|
344 |
+
- uniform
|
345 |
+
- 10.0
|
346 |
+
- 30.0
|
347 |
+
train/BackgroundNoise.sources: null
|
348 |
+
train/BackgroundNoise.weights: null
|
349 |
+
|
350 |
+
train/BaseTransform.keys: []
|
351 |
+
train/BaseTransform.name: null
|
352 |
+
train/BaseTransform.prob: 1.0
|
353 |
+
|
354 |
+
train/ClippingDistortion.name: null
|
355 |
+
train/ClippingDistortion.perc: !!python/tuple
|
356 |
+
- uniform
|
357 |
+
- 0.0
|
358 |
+
- 0.1
|
359 |
+
train/ClippingDistortion.prob: 1.0
|
360 |
+
|
361 |
+
train/CorruptPhase.name: null
|
362 |
+
train/CorruptPhase.prob: 1
|
363 |
+
train/CorruptPhase.scale: !!python/tuple
|
364 |
+
- uniform
|
365 |
+
- 0
|
366 |
+
- 3.141592653589793
|
367 |
+
|
368 |
+
train/CrossTalk.loudness_cutoff: -40
|
369 |
+
train/CrossTalk.name: null
|
370 |
+
train/CrossTalk.prob: 1.0
|
371 |
+
train/CrossTalk.snr: !!python/tuple
|
372 |
+
- uniform
|
373 |
+
- 0.0
|
374 |
+
- 10.0
|
375 |
+
train/CrossTalk.sources: null
|
376 |
+
train/CrossTalk.weights: null
|
377 |
+
|
378 |
+
train/Equalizer.eq_amount: !!python/tuple
|
379 |
+
- const
|
380 |
+
- 1.0
|
381 |
+
train/Equalizer.n_bands: 6
|
382 |
+
train/Equalizer.name: null
|
383 |
+
train/Equalizer.prob: 1.0
|
384 |
+
|
385 |
+
train/FrequencyMask.f_center: !!python/tuple
|
386 |
+
- uniform
|
387 |
+
- 0.0
|
388 |
+
- 1.0
|
389 |
+
train/FrequencyMask.f_width: !!python/tuple
|
390 |
+
- const
|
391 |
+
- 0.1
|
392 |
+
train/FrequencyMask.name: null
|
393 |
+
train/FrequencyMask.prob: 1
|
394 |
+
|
395 |
+
train/FrequencyNoise.f_center: !!python/tuple
|
396 |
+
- uniform
|
397 |
+
- 0.0
|
398 |
+
- 1.0
|
399 |
+
train/FrequencyNoise.f_width: !!python/tuple
|
400 |
+
- const
|
401 |
+
- 0.1
|
402 |
+
train/FrequencyNoise.name: null
|
403 |
+
train/FrequencyNoise.prob: 1
|
404 |
+
|
405 |
+
train/GlobalVolumeNorm.db: !!python/tuple
|
406 |
+
- const
|
407 |
+
- -24
|
408 |
+
train/GlobalVolumeNorm.name: null
|
409 |
+
train/GlobalVolumeNorm.prob: 1.0
|
410 |
+
|
411 |
+
train/HighPass.cutoff: !!python/tuple
|
412 |
+
- choice
|
413 |
+
- - 50
|
414 |
+
- 100
|
415 |
+
- 250
|
416 |
+
- 500
|
417 |
+
- 1000
|
418 |
+
train/HighPass.name: null
|
419 |
+
train/HighPass.prob: 1
|
420 |
+
train/HighPass.zeros: 51
|
421 |
+
|
422 |
+
train/InvertPhase.name: null
|
423 |
+
train/InvertPhase.prob: 1
|
424 |
+
|
425 |
+
train/LowPass.cutoff: !!python/tuple
|
426 |
+
- choice
|
427 |
+
- - 4000
|
428 |
+
- 8000
|
429 |
+
- 16000
|
430 |
+
train/LowPass.name: null
|
431 |
+
train/LowPass.prob: 1
|
432 |
+
train/LowPass.zeros: 51
|
433 |
+
|
434 |
+
train/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
435 |
+
- uniform
|
436 |
+
- -10
|
437 |
+
- 10
|
438 |
+
train/MaskLowMagnitudes.name: null
|
439 |
+
train/MaskLowMagnitudes.prob: 1
|
440 |
+
|
441 |
+
train/MuLawQuantization.channels: !!python/tuple
|
442 |
+
- choice
|
443 |
+
- - 8
|
444 |
+
- 32
|
445 |
+
- 128
|
446 |
+
- 256
|
447 |
+
- 1024
|
448 |
+
train/MuLawQuantization.name: null
|
449 |
+
train/MuLawQuantization.prob: 1.0
|
450 |
+
|
451 |
+
train/NoiseFloor.db: !!python/tuple
|
452 |
+
- const
|
453 |
+
- -50.0
|
454 |
+
train/NoiseFloor.name: null
|
455 |
+
train/NoiseFloor.prob: 1.0
|
456 |
+
|
457 |
+
train/Quantization.channels: !!python/tuple
|
458 |
+
- choice
|
459 |
+
- - 8
|
460 |
+
- 32
|
461 |
+
- 128
|
462 |
+
- 256
|
463 |
+
- 1024
|
464 |
+
train/Quantization.name: null
|
465 |
+
train/Quantization.prob: 1.0
|
466 |
+
|
467 |
+
train/Repeat.n_repeat: 1
|
468 |
+
train/Repeat.name: null
|
469 |
+
train/Repeat.prob: 1.0
|
470 |
+
|
471 |
+
train/RepeatUpTo.max_repeat: 5
|
472 |
+
train/RepeatUpTo.name: null
|
473 |
+
train/RepeatUpTo.prob: 1.0
|
474 |
+
train/RepeatUpTo.weights: null
|
475 |
+
|
476 |
+
train/RescaleAudio.name: null
|
477 |
+
train/RescaleAudio.prob: 1
|
478 |
+
train/RescaleAudio.val: 1.0
|
479 |
+
|
480 |
+
train/RoomImpulseResponse.drr: !!python/tuple
|
481 |
+
- uniform
|
482 |
+
- 0.0
|
483 |
+
- 30.0
|
484 |
+
train/RoomImpulseResponse.duration: 1.0
|
485 |
+
train/RoomImpulseResponse.eq_amount: !!python/tuple
|
486 |
+
- const
|
487 |
+
- 1.0
|
488 |
+
train/RoomImpulseResponse.n_bands: 6
|
489 |
+
train/RoomImpulseResponse.name: null
|
490 |
+
train/RoomImpulseResponse.offset: 0.0
|
491 |
+
train/RoomImpulseResponse.prob: 1.0
|
492 |
+
train/RoomImpulseResponse.sources: null
|
493 |
+
train/RoomImpulseResponse.use_original_phase: false
|
494 |
+
train/RoomImpulseResponse.weights: null
|
495 |
+
|
496 |
+
train/ShiftPhase.name: null
|
497 |
+
train/ShiftPhase.prob: 1
|
498 |
+
train/ShiftPhase.shift: !!python/tuple
|
499 |
+
- uniform
|
500 |
+
- -3.141592653589793
|
501 |
+
- 3.141592653589793
|
502 |
+
|
503 |
+
train/Silence.name: null
|
504 |
+
train/Silence.prob: 0.1
|
505 |
+
|
506 |
+
train/Smoothing.name: null
|
507 |
+
train/Smoothing.prob: 1
|
508 |
+
train/Smoothing.window_length: !!python/tuple
|
509 |
+
- choice
|
510 |
+
- - 8
|
511 |
+
- 16
|
512 |
+
- 32
|
513 |
+
- 64
|
514 |
+
- 128
|
515 |
+
- 256
|
516 |
+
- 512
|
517 |
+
train/Smoothing.window_type: !!python/tuple
|
518 |
+
- const
|
519 |
+
- average
|
520 |
+
|
521 |
+
train/SpectralDenoising.denoise_amount: !!python/tuple
|
522 |
+
- uniform
|
523 |
+
- 0.8
|
524 |
+
- 1.0
|
525 |
+
train/SpectralDenoising.eq_amount: !!python/tuple
|
526 |
+
- const
|
527 |
+
- 1.0
|
528 |
+
train/SpectralDenoising.n_bands: 6
|
529 |
+
train/SpectralDenoising.n_freq: 3
|
530 |
+
train/SpectralDenoising.n_time: 5
|
531 |
+
train/SpectralDenoising.name: null
|
532 |
+
train/SpectralDenoising.nz_volume: -40
|
533 |
+
train/SpectralDenoising.prob: 1
|
534 |
+
|
535 |
+
train/TimeMask.name: null
|
536 |
+
train/TimeMask.prob: 1
|
537 |
+
train/TimeMask.t_center: !!python/tuple
|
538 |
+
- uniform
|
539 |
+
- 0.0
|
540 |
+
- 1.0
|
541 |
+
train/TimeMask.t_width: !!python/tuple
|
542 |
+
- const
|
543 |
+
- 0.025
|
544 |
+
|
545 |
+
train/TimeNoise.name: null
|
546 |
+
train/TimeNoise.prob: 1
|
547 |
+
train/TimeNoise.t_center: !!python/tuple
|
548 |
+
- uniform
|
549 |
+
- 0.0
|
550 |
+
- 1.0
|
551 |
+
train/TimeNoise.t_width: !!python/tuple
|
552 |
+
- const
|
553 |
+
- 0.025
|
554 |
+
|
555 |
+
train/VolumeChange.db: !!python/tuple
|
556 |
+
- uniform
|
557 |
+
- -12.0
|
558 |
+
- 0.0
|
559 |
+
train/VolumeChange.name: null
|
560 |
+
train/VolumeChange.prob: 1.0
|
561 |
+
|
562 |
+
train/VolumeNorm.db: !!python/tuple
|
563 |
+
- const
|
564 |
+
- -24
|
565 |
+
train/VolumeNorm.name: null
|
566 |
+
train/VolumeNorm.prob: 1.0
|
567 |
+
|
568 |
+
val/AudioDataset.aligned: false
|
569 |
+
val/AudioDataset.duration: 10.0
|
570 |
+
val/AudioDataset.loudness_cutoff: -30.0
|
571 |
+
val/AudioDataset.n_examples: 500
|
572 |
+
val/AudioDataset.num_channels: 1
|
573 |
+
val/AudioDataset.offset: null
|
574 |
+
val/AudioDataset.shuffle_loaders: false
|
575 |
+
val/AudioDataset.without_replacement: false
|
576 |
+
|
577 |
+
val/AudioLoader.sources:
|
578 |
+
- /media/CHONK/hugo/loras/boleros
|
579 |
+
|
580 |
+
val/BackgroundNoise.eq_amount: !!python/tuple
|
581 |
+
- const
|
582 |
+
- 1.0
|
583 |
+
val/BackgroundNoise.loudness_cutoff: null
|
584 |
+
val/BackgroundNoise.n_bands: 3
|
585 |
+
val/BackgroundNoise.name: null
|
586 |
+
val/BackgroundNoise.prob: 1.0
|
587 |
+
val/BackgroundNoise.snr: !!python/tuple
|
588 |
+
- uniform
|
589 |
+
- 10.0
|
590 |
+
- 30.0
|
591 |
+
val/BackgroundNoise.sources: null
|
592 |
+
val/BackgroundNoise.weights: null
|
593 |
+
|
594 |
+
val/BaseTransform.keys: []
|
595 |
+
val/BaseTransform.name: null
|
596 |
+
val/BaseTransform.prob: 1.0
|
597 |
+
|
598 |
+
val/ClippingDistortion.name: null
|
599 |
+
val/ClippingDistortion.perc: !!python/tuple
|
600 |
+
- uniform
|
601 |
+
- 0.0
|
602 |
+
- 0.1
|
603 |
+
val/ClippingDistortion.prob: 1.0
|
604 |
+
|
605 |
+
val/CorruptPhase.name: null
|
606 |
+
val/CorruptPhase.prob: 1
|
607 |
+
val/CorruptPhase.scale: !!python/tuple
|
608 |
+
- uniform
|
609 |
+
- 0
|
610 |
+
- 3.141592653589793
|
611 |
+
|
612 |
+
val/CrossTalk.loudness_cutoff: -40
|
613 |
+
val/CrossTalk.name: null
|
614 |
+
val/CrossTalk.prob: 1.0
|
615 |
+
val/CrossTalk.snr: !!python/tuple
|
616 |
+
- uniform
|
617 |
+
- 0.0
|
618 |
+
- 10.0
|
619 |
+
val/CrossTalk.sources: null
|
620 |
+
val/CrossTalk.weights: null
|
621 |
+
|
622 |
+
val/Equalizer.eq_amount: !!python/tuple
|
623 |
+
- const
|
624 |
+
- 1.0
|
625 |
+
val/Equalizer.n_bands: 6
|
626 |
+
val/Equalizer.name: null
|
627 |
+
val/Equalizer.prob: 1.0
|
628 |
+
|
629 |
+
val/FrequencyMask.f_center: !!python/tuple
|
630 |
+
- uniform
|
631 |
+
- 0.0
|
632 |
+
- 1.0
|
633 |
+
val/FrequencyMask.f_width: !!python/tuple
|
634 |
+
- const
|
635 |
+
- 0.1
|
636 |
+
val/FrequencyMask.name: null
|
637 |
+
val/FrequencyMask.prob: 1
|
638 |
+
|
639 |
+
val/FrequencyNoise.f_center: !!python/tuple
|
640 |
+
- uniform
|
641 |
+
- 0.0
|
642 |
+
- 1.0
|
643 |
+
val/FrequencyNoise.f_width: !!python/tuple
|
644 |
+
- const
|
645 |
+
- 0.1
|
646 |
+
val/FrequencyNoise.name: null
|
647 |
+
val/FrequencyNoise.prob: 1
|
648 |
+
|
649 |
+
val/GlobalVolumeNorm.db: !!python/tuple
|
650 |
+
- const
|
651 |
+
- -24
|
652 |
+
val/GlobalVolumeNorm.name: null
|
653 |
+
val/GlobalVolumeNorm.prob: 1.0
|
654 |
+
|
655 |
+
val/HighPass.cutoff: !!python/tuple
|
656 |
+
- choice
|
657 |
+
- - 50
|
658 |
+
- 100
|
659 |
+
- 250
|
660 |
+
- 500
|
661 |
+
- 1000
|
662 |
+
val/HighPass.name: null
|
663 |
+
val/HighPass.prob: 1
|
664 |
+
val/HighPass.zeros: 51
|
665 |
+
|
666 |
+
val/InvertPhase.name: null
|
667 |
+
val/InvertPhase.prob: 1
|
668 |
+
|
669 |
+
val/LowPass.cutoff: !!python/tuple
|
670 |
+
- choice
|
671 |
+
- - 4000
|
672 |
+
- 8000
|
673 |
+
- 16000
|
674 |
+
val/LowPass.name: null
|
675 |
+
val/LowPass.prob: 1
|
676 |
+
val/LowPass.zeros: 51
|
677 |
+
|
678 |
+
val/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
679 |
+
- uniform
|
680 |
+
- -10
|
681 |
+
- 10
|
682 |
+
val/MaskLowMagnitudes.name: null
|
683 |
+
val/MaskLowMagnitudes.prob: 1
|
684 |
+
|
685 |
+
val/MuLawQuantization.channels: !!python/tuple
|
686 |
+
- choice
|
687 |
+
- - 8
|
688 |
+
- 32
|
689 |
+
- 128
|
690 |
+
- 256
|
691 |
+
- 1024
|
692 |
+
val/MuLawQuantization.name: null
|
693 |
+
val/MuLawQuantization.prob: 1.0
|
694 |
+
|
695 |
+
val/NoiseFloor.db: !!python/tuple
|
696 |
+
- const
|
697 |
+
- -50.0
|
698 |
+
val/NoiseFloor.name: null
|
699 |
+
val/NoiseFloor.prob: 1.0
|
700 |
+
|
701 |
+
val/Quantization.channels: !!python/tuple
|
702 |
+
- choice
|
703 |
+
- - 8
|
704 |
+
- 32
|
705 |
+
- 128
|
706 |
+
- 256
|
707 |
+
- 1024
|
708 |
+
val/Quantization.name: null
|
709 |
+
val/Quantization.prob: 1.0
|
710 |
+
|
711 |
+
val/Repeat.n_repeat: 1
|
712 |
+
val/Repeat.name: null
|
713 |
+
val/Repeat.prob: 1.0
|
714 |
+
|
715 |
+
val/RepeatUpTo.max_repeat: 5
|
716 |
+
val/RepeatUpTo.name: null
|
717 |
+
val/RepeatUpTo.prob: 1.0
|
718 |
+
val/RepeatUpTo.weights: null
|
719 |
+
|
720 |
+
val/RescaleAudio.name: null
|
721 |
+
val/RescaleAudio.prob: 1
|
722 |
+
val/RescaleAudio.val: 1.0
|
723 |
+
|
724 |
+
val/RoomImpulseResponse.drr: !!python/tuple
|
725 |
+
- uniform
|
726 |
+
- 0.0
|
727 |
+
- 30.0
|
728 |
+
val/RoomImpulseResponse.duration: 1.0
|
729 |
+
val/RoomImpulseResponse.eq_amount: !!python/tuple
|
730 |
+
- const
|
731 |
+
- 1.0
|
732 |
+
val/RoomImpulseResponse.n_bands: 6
|
733 |
+
val/RoomImpulseResponse.name: null
|
734 |
+
val/RoomImpulseResponse.offset: 0.0
|
735 |
+
val/RoomImpulseResponse.prob: 1.0
|
736 |
+
val/RoomImpulseResponse.sources: null
|
737 |
+
val/RoomImpulseResponse.use_original_phase: false
|
738 |
+
val/RoomImpulseResponse.weights: null
|
739 |
+
|
740 |
+
val/ShiftPhase.name: null
|
741 |
+
val/ShiftPhase.prob: 1
|
742 |
+
val/ShiftPhase.shift: !!python/tuple
|
743 |
+
- uniform
|
744 |
+
- -3.141592653589793
|
745 |
+
- 3.141592653589793
|
746 |
+
|
747 |
+
val/Silence.name: null
|
748 |
+
val/Silence.prob: 0.1
|
749 |
+
|
750 |
+
val/Smoothing.name: null
|
751 |
+
val/Smoothing.prob: 1
|
752 |
+
val/Smoothing.window_length: !!python/tuple
|
753 |
+
- choice
|
754 |
+
- - 8
|
755 |
+
- 16
|
756 |
+
- 32
|
757 |
+
- 64
|
758 |
+
- 128
|
759 |
+
- 256
|
760 |
+
- 512
|
761 |
+
val/Smoothing.window_type: !!python/tuple
|
762 |
+
- const
|
763 |
+
- average
|
764 |
+
|
765 |
+
val/SpectralDenoising.denoise_amount: !!python/tuple
|
766 |
+
- uniform
|
767 |
+
- 0.8
|
768 |
+
- 1.0
|
769 |
+
val/SpectralDenoising.eq_amount: !!python/tuple
|
770 |
+
- const
|
771 |
+
- 1.0
|
772 |
+
val/SpectralDenoising.n_bands: 6
|
773 |
+
val/SpectralDenoising.n_freq: 3
|
774 |
+
val/SpectralDenoising.n_time: 5
|
775 |
+
val/SpectralDenoising.name: null
|
776 |
+
val/SpectralDenoising.nz_volume: -40
|
777 |
+
val/SpectralDenoising.prob: 1
|
778 |
+
|
779 |
+
val/TimeMask.name: null
|
780 |
+
val/TimeMask.prob: 1
|
781 |
+
val/TimeMask.t_center: !!python/tuple
|
782 |
+
- uniform
|
783 |
+
- 0.0
|
784 |
+
- 1.0
|
785 |
+
val/TimeMask.t_width: !!python/tuple
|
786 |
+
- const
|
787 |
+
- 0.025
|
788 |
+
|
789 |
+
val/TimeNoise.name: null
|
790 |
+
val/TimeNoise.prob: 1
|
791 |
+
val/TimeNoise.t_center: !!python/tuple
|
792 |
+
- uniform
|
793 |
+
- 0.0
|
794 |
+
- 1.0
|
795 |
+
val/TimeNoise.t_width: !!python/tuple
|
796 |
+
- const
|
797 |
+
- 0.025
|
798 |
+
|
799 |
+
val/VolumeChange.db: !!python/tuple
|
800 |
+
- uniform
|
801 |
+
- -12.0
|
802 |
+
- 0.0
|
803 |
+
val/VolumeChange.name: null
|
804 |
+
val/VolumeChange.prob: 1.0
|
805 |
+
|
806 |
+
val/VolumeNorm.db: !!python/tuple
|
807 |
+
- const
|
808 |
+
- -24
|
809 |
+
val/VolumeNorm.name: null
|
810 |
+
val/VolumeNorm.prob: 1.0
|
811 |
+
|
812 |
+
val_freq: 500
|
813 |
+
|
814 |
+
val_idx:
|
815 |
+
- 0
|
816 |
+
- 1
|
817 |
+
- 2
|
818 |
+
- 3
|
819 |
+
- 4
|
820 |
+
- 5
|
821 |
+
- 6
|
822 |
+
- 7
|
823 |
+
- 8
|
824 |
+
- 9
|
825 |
+
|
runs/boleros/coarse/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f4cab5127c211565b6c408d4affe734f07503935828422dcc958ff7d4c7cf4d5
|
3 |
+
size 1343718241
|
runs/boleros/coarse/model.txt
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
OptimizedModule(
|
2 |
+
335.894M params.
|
3 |
+
(_orig_mod): VampNet(
|
4 |
+
335.894M params.
|
5 |
+
(embedding): CodebookEmbedding(
|
6 |
+
0.042M params.
|
7 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
|
8 |
+
(out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
|
9 |
+
)
|
10 |
+
(transformer): TransformerStack(
|
11 |
+
330.600M params.
|
12 |
+
(layers): ModuleList(
|
13 |
+
(0): TransformerLayer(
|
14 |
+
16.531M params.
|
15 |
+
(norm_1): RMSNorm( 0.001M params.)
|
16 |
+
(film_1): FiLM( 0.000M params.)
|
17 |
+
(self_attn): MultiHeadRelativeAttention(
|
18 |
+
6.616M params.
|
19 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
21 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
22 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
23 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
24 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
25 |
+
)
|
26 |
+
(norm_3): RMSNorm( 0.001M params.)
|
27 |
+
(film_3): FiLM( 0.000M params.)
|
28 |
+
(feed_forward): FeedForward(
|
29 |
+
9.912M params.
|
30 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
31 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
32 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
33 |
+
(act): GatedGELU(
|
34 |
+
0.000M params.
|
35 |
+
(gelu): NewGELU( 0.000M params.)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
39 |
+
)
|
40 |
+
(1-19): 19 x TransformerLayer(
|
41 |
+
16.530M params.
|
42 |
+
(norm_1): RMSNorm( 0.001M params.)
|
43 |
+
(film_1): FiLM( 0.000M params.)
|
44 |
+
(self_attn): MultiHeadRelativeAttention(
|
45 |
+
6.615M params.
|
46 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
48 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
49 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
50 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
51 |
+
)
|
52 |
+
(norm_3): RMSNorm( 0.001M params.)
|
53 |
+
(film_3): FiLM( 0.000M params.)
|
54 |
+
(feed_forward): FeedForward(
|
55 |
+
9.912M params.
|
56 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
57 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
58 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
59 |
+
(act): GatedGELU(
|
60 |
+
0.000M params.
|
61 |
+
(gelu): NewGELU( 0.000M params.)
|
62 |
+
)
|
63 |
+
)
|
64 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
65 |
+
)
|
66 |
+
)
|
67 |
+
(norm): RMSNorm( 0.001M params.)
|
68 |
+
)
|
69 |
+
(classifier): SequentialWithFiLM(
|
70 |
+
5.251M params.
|
71 |
+
(layers): ModuleList(
|
72 |
+
(0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
|
73 |
+
)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
)
|
runs/choir/c2f/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3fd753f116f3778c23380ab3d04de9c2525a7b80adb67290042abf7b55415da5
|
3 |
+
size 1111127537
|
runs/choir/coarse/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c29a1dfe20e7ddcd6dc8a6a41015d3d63447d4363fde3c978684196b0e12b82d
|
3 |
+
size 1343718241
|
runs/knower/c2f/args.yml
ADDED
@@ -0,0 +1,824 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 3.0
|
14 |
+
AudioDataset.loudness_cutoff: -40.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: /data/
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
BackgroundNoise.eq_amount: !!python/tuple
|
33 |
+
- const
|
34 |
+
- 1.0
|
35 |
+
BackgroundNoise.loudness_cutoff: null
|
36 |
+
BackgroundNoise.n_bands: 3
|
37 |
+
BackgroundNoise.name: null
|
38 |
+
BackgroundNoise.prob: 1.0
|
39 |
+
BackgroundNoise.snr: !!python/tuple
|
40 |
+
- uniform
|
41 |
+
- 10.0
|
42 |
+
- 30.0
|
43 |
+
BackgroundNoise.sources: null
|
44 |
+
BackgroundNoise.weights: null
|
45 |
+
|
46 |
+
BaseTransform.keys: []
|
47 |
+
BaseTransform.name: null
|
48 |
+
BaseTransform.prob: 1.0
|
49 |
+
|
50 |
+
ClippingDistortion.name: null
|
51 |
+
ClippingDistortion.perc: !!python/tuple
|
52 |
+
- uniform
|
53 |
+
- 0.0
|
54 |
+
- 0.1
|
55 |
+
ClippingDistortion.prob: 1.0
|
56 |
+
|
57 |
+
CorruptPhase.name: null
|
58 |
+
CorruptPhase.prob: 1
|
59 |
+
CorruptPhase.scale: !!python/tuple
|
60 |
+
- uniform
|
61 |
+
- 0
|
62 |
+
- 3.141592653589793
|
63 |
+
|
64 |
+
CrossEntropyLoss.ignore_index: -100
|
65 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
66 |
+
CrossEntropyLoss.reduce: null
|
67 |
+
CrossEntropyLoss.reduction: mean
|
68 |
+
CrossEntropyLoss.size_average: null
|
69 |
+
|
70 |
+
CrossTalk.loudness_cutoff: -40
|
71 |
+
CrossTalk.name: null
|
72 |
+
CrossTalk.prob: 1.0
|
73 |
+
CrossTalk.snr: !!python/tuple
|
74 |
+
- uniform
|
75 |
+
- 0.0
|
76 |
+
- 10.0
|
77 |
+
CrossTalk.sources: null
|
78 |
+
CrossTalk.weights: null
|
79 |
+
|
80 |
+
Equalizer.eq_amount: !!python/tuple
|
81 |
+
- const
|
82 |
+
- 1.0
|
83 |
+
Equalizer.n_bands: 6
|
84 |
+
Equalizer.name: null
|
85 |
+
Equalizer.prob: 1.0
|
86 |
+
|
87 |
+
FrequencyMask.f_center: !!python/tuple
|
88 |
+
- uniform
|
89 |
+
- 0.0
|
90 |
+
- 1.0
|
91 |
+
FrequencyMask.f_width: !!python/tuple
|
92 |
+
- const
|
93 |
+
- 0.1
|
94 |
+
FrequencyMask.name: null
|
95 |
+
FrequencyMask.prob: 1
|
96 |
+
|
97 |
+
FrequencyNoise.f_center: !!python/tuple
|
98 |
+
- uniform
|
99 |
+
- 0.0
|
100 |
+
- 1.0
|
101 |
+
FrequencyNoise.f_width: !!python/tuple
|
102 |
+
- const
|
103 |
+
- 0.1
|
104 |
+
FrequencyNoise.name: null
|
105 |
+
FrequencyNoise.prob: 1
|
106 |
+
|
107 |
+
GlobalVolumeNorm.db: !!python/tuple
|
108 |
+
- const
|
109 |
+
- -24
|
110 |
+
GlobalVolumeNorm.name: null
|
111 |
+
GlobalVolumeNorm.prob: 1.0
|
112 |
+
|
113 |
+
HighPass.cutoff: !!python/tuple
|
114 |
+
- choice
|
115 |
+
- - 50
|
116 |
+
- 100
|
117 |
+
- 250
|
118 |
+
- 500
|
119 |
+
- 1000
|
120 |
+
HighPass.name: null
|
121 |
+
HighPass.prob: 1
|
122 |
+
HighPass.zeros: 51
|
123 |
+
|
124 |
+
InvertPhase.name: null
|
125 |
+
InvertPhase.prob: 1
|
126 |
+
|
127 |
+
LowPass.cutoff: !!python/tuple
|
128 |
+
- choice
|
129 |
+
- - 4000
|
130 |
+
- 8000
|
131 |
+
- 16000
|
132 |
+
LowPass.name: null
|
133 |
+
LowPass.prob: 1
|
134 |
+
LowPass.zeros: 51
|
135 |
+
|
136 |
+
MaskLowMagnitudes.db_cutoff: !!python/tuple
|
137 |
+
- uniform
|
138 |
+
- -10
|
139 |
+
- 10
|
140 |
+
MaskLowMagnitudes.name: null
|
141 |
+
MaskLowMagnitudes.prob: 1
|
142 |
+
|
143 |
+
MuLawQuantization.channels: !!python/tuple
|
144 |
+
- choice
|
145 |
+
- - 8
|
146 |
+
- 32
|
147 |
+
- 128
|
148 |
+
- 256
|
149 |
+
- 1024
|
150 |
+
MuLawQuantization.name: null
|
151 |
+
MuLawQuantization.prob: 1.0
|
152 |
+
|
153 |
+
NoamScheduler.d_model: 512
|
154 |
+
NoamScheduler.factor: 2.0
|
155 |
+
NoamScheduler.warmup: 500
|
156 |
+
|
157 |
+
NoiseFloor.db: !!python/tuple
|
158 |
+
- const
|
159 |
+
- -50.0
|
160 |
+
NoiseFloor.name: null
|
161 |
+
NoiseFloor.prob: 1.0
|
162 |
+
|
163 |
+
Quantization.channels: !!python/tuple
|
164 |
+
- choice
|
165 |
+
- - 8
|
166 |
+
- 32
|
167 |
+
- 128
|
168 |
+
- 256
|
169 |
+
- 1024
|
170 |
+
Quantization.name: null
|
171 |
+
Quantization.prob: 1.0
|
172 |
+
|
173 |
+
Repeat.n_repeat: 1
|
174 |
+
Repeat.name: null
|
175 |
+
Repeat.prob: 1.0
|
176 |
+
|
177 |
+
RepeatUpTo.max_repeat: 5
|
178 |
+
RepeatUpTo.name: null
|
179 |
+
RepeatUpTo.prob: 1.0
|
180 |
+
RepeatUpTo.weights: null
|
181 |
+
|
182 |
+
RescaleAudio.name: null
|
183 |
+
RescaleAudio.prob: 1
|
184 |
+
RescaleAudio.val: 1.0
|
185 |
+
|
186 |
+
RoomImpulseResponse.drr: !!python/tuple
|
187 |
+
- uniform
|
188 |
+
- 0.0
|
189 |
+
- 30.0
|
190 |
+
RoomImpulseResponse.duration: 1.0
|
191 |
+
RoomImpulseResponse.eq_amount: !!python/tuple
|
192 |
+
- const
|
193 |
+
- 1.0
|
194 |
+
RoomImpulseResponse.n_bands: 6
|
195 |
+
RoomImpulseResponse.name: null
|
196 |
+
RoomImpulseResponse.offset: 0.0
|
197 |
+
RoomImpulseResponse.prob: 1.0
|
198 |
+
RoomImpulseResponse.sources: null
|
199 |
+
RoomImpulseResponse.use_original_phase: false
|
200 |
+
RoomImpulseResponse.weights: null
|
201 |
+
|
202 |
+
ShiftPhase.name: null
|
203 |
+
ShiftPhase.prob: 1
|
204 |
+
ShiftPhase.shift: !!python/tuple
|
205 |
+
- uniform
|
206 |
+
- -3.141592653589793
|
207 |
+
- 3.141592653589793
|
208 |
+
|
209 |
+
Silence.name: null
|
210 |
+
Silence.prob: 0.1
|
211 |
+
|
212 |
+
Smoothing.name: null
|
213 |
+
Smoothing.prob: 1
|
214 |
+
Smoothing.window_length: !!python/tuple
|
215 |
+
- choice
|
216 |
+
- - 8
|
217 |
+
- 16
|
218 |
+
- 32
|
219 |
+
- 64
|
220 |
+
- 128
|
221 |
+
- 256
|
222 |
+
- 512
|
223 |
+
Smoothing.window_type: !!python/tuple
|
224 |
+
- const
|
225 |
+
- average
|
226 |
+
|
227 |
+
SpectralDenoising.denoise_amount: !!python/tuple
|
228 |
+
- uniform
|
229 |
+
- 0.8
|
230 |
+
- 1.0
|
231 |
+
SpectralDenoising.eq_amount: !!python/tuple
|
232 |
+
- const
|
233 |
+
- 1.0
|
234 |
+
SpectralDenoising.n_bands: 6
|
235 |
+
SpectralDenoising.n_freq: 3
|
236 |
+
SpectralDenoising.n_time: 5
|
237 |
+
SpectralDenoising.name: null
|
238 |
+
SpectralDenoising.nz_volume: -40
|
239 |
+
SpectralDenoising.prob: 1
|
240 |
+
|
241 |
+
TimeMask.name: null
|
242 |
+
TimeMask.prob: 1
|
243 |
+
TimeMask.t_center: !!python/tuple
|
244 |
+
- uniform
|
245 |
+
- 0.0
|
246 |
+
- 1.0
|
247 |
+
TimeMask.t_width: !!python/tuple
|
248 |
+
- const
|
249 |
+
- 0.025
|
250 |
+
|
251 |
+
TimeNoise.name: null
|
252 |
+
TimeNoise.prob: 1
|
253 |
+
TimeNoise.t_center: !!python/tuple
|
254 |
+
- uniform
|
255 |
+
- 0.0
|
256 |
+
- 1.0
|
257 |
+
TimeNoise.t_width: !!python/tuple
|
258 |
+
- const
|
259 |
+
- 0.025
|
260 |
+
|
261 |
+
VampNet.dropout: 0.1
|
262 |
+
VampNet.embedding_dim: 1280
|
263 |
+
VampNet.flash_attn: false
|
264 |
+
VampNet.latent_dim: 8
|
265 |
+
VampNet.n_codebooks: 14
|
266 |
+
VampNet.n_conditioning_codebooks: 4
|
267 |
+
VampNet.n_heads: 20
|
268 |
+
VampNet.n_layers: 16
|
269 |
+
VampNet.noise_mode: mask
|
270 |
+
VampNet.r_cond_dim: 0
|
271 |
+
VampNet.vocab_size: 1024
|
272 |
+
|
273 |
+
VolumeChange.db: !!python/tuple
|
274 |
+
- uniform
|
275 |
+
- -12.0
|
276 |
+
- 0.0
|
277 |
+
VolumeChange.name: null
|
278 |
+
VolumeChange.prob: 1.0
|
279 |
+
|
280 |
+
VolumeNorm.db: !!python/tuple
|
281 |
+
- const
|
282 |
+
- -24
|
283 |
+
VolumeNorm.name: null
|
284 |
+
VolumeNorm.prob: 1.0
|
285 |
+
|
286 |
+
amp: false
|
287 |
+
|
288 |
+
args.debug: true
|
289 |
+
args.load: conf/generated/knower/c2f.yml
|
290 |
+
args.save: null
|
291 |
+
|
292 |
+
batch_size: 6
|
293 |
+
|
294 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
295 |
+
|
296 |
+
fine_tune: true
|
297 |
+
|
298 |
+
fine_tune_checkpoint: ./models/vampnet/c2f.pth
|
299 |
+
|
300 |
+
grad_clip_val: 5.0
|
301 |
+
|
302 |
+
num_iters: 500000
|
303 |
+
|
304 |
+
num_workers: 7
|
305 |
+
|
306 |
+
resume: true
|
307 |
+
|
308 |
+
sample_freq: 1000
|
309 |
+
|
310 |
+
save_iters:
|
311 |
+
- 10000
|
312 |
+
- 20000
|
313 |
+
- 30000
|
314 |
+
- 40000
|
315 |
+
- 50000
|
316 |
+
|
317 |
+
save_path: ./runs/knower/c2f
|
318 |
+
|
319 |
+
seed: 0
|
320 |
+
|
321 |
+
tag: latest
|
322 |
+
|
323 |
+
train/AudioDataset.aligned: false
|
324 |
+
train/AudioDataset.duration: 3.0
|
325 |
+
train/AudioDataset.loudness_cutoff: -40.0
|
326 |
+
train/AudioDataset.n_examples: 100000000
|
327 |
+
train/AudioDataset.num_channels: 1
|
328 |
+
train/AudioDataset.offset: null
|
329 |
+
train/AudioDataset.shuffle_loaders: false
|
330 |
+
train/AudioDataset.without_replacement: false
|
331 |
+
|
332 |
+
train/AudioLoader.sources:
|
333 |
+
- /media/CHONK/hugo/knower
|
334 |
+
|
335 |
+
train/BackgroundNoise.eq_amount: !!python/tuple
|
336 |
+
- const
|
337 |
+
- 1.0
|
338 |
+
train/BackgroundNoise.loudness_cutoff: null
|
339 |
+
train/BackgroundNoise.n_bands: 3
|
340 |
+
train/BackgroundNoise.name: null
|
341 |
+
train/BackgroundNoise.prob: 1.0
|
342 |
+
train/BackgroundNoise.snr: !!python/tuple
|
343 |
+
- uniform
|
344 |
+
- 10.0
|
345 |
+
- 30.0
|
346 |
+
train/BackgroundNoise.sources: null
|
347 |
+
train/BackgroundNoise.weights: null
|
348 |
+
|
349 |
+
train/BaseTransform.keys: []
|
350 |
+
train/BaseTransform.name: null
|
351 |
+
train/BaseTransform.prob: 1.0
|
352 |
+
|
353 |
+
train/ClippingDistortion.name: null
|
354 |
+
train/ClippingDistortion.perc: !!python/tuple
|
355 |
+
- uniform
|
356 |
+
- 0.0
|
357 |
+
- 0.1
|
358 |
+
train/ClippingDistortion.prob: 1.0
|
359 |
+
|
360 |
+
train/CorruptPhase.name: null
|
361 |
+
train/CorruptPhase.prob: 1
|
362 |
+
train/CorruptPhase.scale: !!python/tuple
|
363 |
+
- uniform
|
364 |
+
- 0
|
365 |
+
- 3.141592653589793
|
366 |
+
|
367 |
+
train/CrossTalk.loudness_cutoff: -40
|
368 |
+
train/CrossTalk.name: null
|
369 |
+
train/CrossTalk.prob: 1.0
|
370 |
+
train/CrossTalk.snr: !!python/tuple
|
371 |
+
- uniform
|
372 |
+
- 0.0
|
373 |
+
- 10.0
|
374 |
+
train/CrossTalk.sources: null
|
375 |
+
train/CrossTalk.weights: null
|
376 |
+
|
377 |
+
train/Equalizer.eq_amount: !!python/tuple
|
378 |
+
- const
|
379 |
+
- 1.0
|
380 |
+
train/Equalizer.n_bands: 6
|
381 |
+
train/Equalizer.name: null
|
382 |
+
train/Equalizer.prob: 1.0
|
383 |
+
|
384 |
+
train/FrequencyMask.f_center: !!python/tuple
|
385 |
+
- uniform
|
386 |
+
- 0.0
|
387 |
+
- 1.0
|
388 |
+
train/FrequencyMask.f_width: !!python/tuple
|
389 |
+
- const
|
390 |
+
- 0.1
|
391 |
+
train/FrequencyMask.name: null
|
392 |
+
train/FrequencyMask.prob: 1
|
393 |
+
|
394 |
+
train/FrequencyNoise.f_center: !!python/tuple
|
395 |
+
- uniform
|
396 |
+
- 0.0
|
397 |
+
- 1.0
|
398 |
+
train/FrequencyNoise.f_width: !!python/tuple
|
399 |
+
- const
|
400 |
+
- 0.1
|
401 |
+
train/FrequencyNoise.name: null
|
402 |
+
train/FrequencyNoise.prob: 1
|
403 |
+
|
404 |
+
train/GlobalVolumeNorm.db: !!python/tuple
|
405 |
+
- const
|
406 |
+
- -24
|
407 |
+
train/GlobalVolumeNorm.name: null
|
408 |
+
train/GlobalVolumeNorm.prob: 1.0
|
409 |
+
|
410 |
+
train/HighPass.cutoff: !!python/tuple
|
411 |
+
- choice
|
412 |
+
- - 50
|
413 |
+
- 100
|
414 |
+
- 250
|
415 |
+
- 500
|
416 |
+
- 1000
|
417 |
+
train/HighPass.name: null
|
418 |
+
train/HighPass.prob: 1
|
419 |
+
train/HighPass.zeros: 51
|
420 |
+
|
421 |
+
train/InvertPhase.name: null
|
422 |
+
train/InvertPhase.prob: 1
|
423 |
+
|
424 |
+
train/LowPass.cutoff: !!python/tuple
|
425 |
+
- choice
|
426 |
+
- - 4000
|
427 |
+
- 8000
|
428 |
+
- 16000
|
429 |
+
train/LowPass.name: null
|
430 |
+
train/LowPass.prob: 1
|
431 |
+
train/LowPass.zeros: 51
|
432 |
+
|
433 |
+
train/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
434 |
+
- uniform
|
435 |
+
- -10
|
436 |
+
- 10
|
437 |
+
train/MaskLowMagnitudes.name: null
|
438 |
+
train/MaskLowMagnitudes.prob: 1
|
439 |
+
|
440 |
+
train/MuLawQuantization.channels: !!python/tuple
|
441 |
+
- choice
|
442 |
+
- - 8
|
443 |
+
- 32
|
444 |
+
- 128
|
445 |
+
- 256
|
446 |
+
- 1024
|
447 |
+
train/MuLawQuantization.name: null
|
448 |
+
train/MuLawQuantization.prob: 1.0
|
449 |
+
|
450 |
+
train/NoiseFloor.db: !!python/tuple
|
451 |
+
- const
|
452 |
+
- -50.0
|
453 |
+
train/NoiseFloor.name: null
|
454 |
+
train/NoiseFloor.prob: 1.0
|
455 |
+
|
456 |
+
train/Quantization.channels: !!python/tuple
|
457 |
+
- choice
|
458 |
+
- - 8
|
459 |
+
- 32
|
460 |
+
- 128
|
461 |
+
- 256
|
462 |
+
- 1024
|
463 |
+
train/Quantization.name: null
|
464 |
+
train/Quantization.prob: 1.0
|
465 |
+
|
466 |
+
train/Repeat.n_repeat: 1
|
467 |
+
train/Repeat.name: null
|
468 |
+
train/Repeat.prob: 1.0
|
469 |
+
|
470 |
+
train/RepeatUpTo.max_repeat: 5
|
471 |
+
train/RepeatUpTo.name: null
|
472 |
+
train/RepeatUpTo.prob: 1.0
|
473 |
+
train/RepeatUpTo.weights: null
|
474 |
+
|
475 |
+
train/RescaleAudio.name: null
|
476 |
+
train/RescaleAudio.prob: 1
|
477 |
+
train/RescaleAudio.val: 1.0
|
478 |
+
|
479 |
+
train/RoomImpulseResponse.drr: !!python/tuple
|
480 |
+
- uniform
|
481 |
+
- 0.0
|
482 |
+
- 30.0
|
483 |
+
train/RoomImpulseResponse.duration: 1.0
|
484 |
+
train/RoomImpulseResponse.eq_amount: !!python/tuple
|
485 |
+
- const
|
486 |
+
- 1.0
|
487 |
+
train/RoomImpulseResponse.n_bands: 6
|
488 |
+
train/RoomImpulseResponse.name: null
|
489 |
+
train/RoomImpulseResponse.offset: 0.0
|
490 |
+
train/RoomImpulseResponse.prob: 1.0
|
491 |
+
train/RoomImpulseResponse.sources: null
|
492 |
+
train/RoomImpulseResponse.use_original_phase: false
|
493 |
+
train/RoomImpulseResponse.weights: null
|
494 |
+
|
495 |
+
train/ShiftPhase.name: null
|
496 |
+
train/ShiftPhase.prob: 1
|
497 |
+
train/ShiftPhase.shift: !!python/tuple
|
498 |
+
- uniform
|
499 |
+
- -3.141592653589793
|
500 |
+
- 3.141592653589793
|
501 |
+
|
502 |
+
train/Silence.name: null
|
503 |
+
train/Silence.prob: 0.1
|
504 |
+
|
505 |
+
train/Smoothing.name: null
|
506 |
+
train/Smoothing.prob: 1
|
507 |
+
train/Smoothing.window_length: !!python/tuple
|
508 |
+
- choice
|
509 |
+
- - 8
|
510 |
+
- 16
|
511 |
+
- 32
|
512 |
+
- 64
|
513 |
+
- 128
|
514 |
+
- 256
|
515 |
+
- 512
|
516 |
+
train/Smoothing.window_type: !!python/tuple
|
517 |
+
- const
|
518 |
+
- average
|
519 |
+
|
520 |
+
train/SpectralDenoising.denoise_amount: !!python/tuple
|
521 |
+
- uniform
|
522 |
+
- 0.8
|
523 |
+
- 1.0
|
524 |
+
train/SpectralDenoising.eq_amount: !!python/tuple
|
525 |
+
- const
|
526 |
+
- 1.0
|
527 |
+
train/SpectralDenoising.n_bands: 6
|
528 |
+
train/SpectralDenoising.n_freq: 3
|
529 |
+
train/SpectralDenoising.n_time: 5
|
530 |
+
train/SpectralDenoising.name: null
|
531 |
+
train/SpectralDenoising.nz_volume: -40
|
532 |
+
train/SpectralDenoising.prob: 1
|
533 |
+
|
534 |
+
train/TimeMask.name: null
|
535 |
+
train/TimeMask.prob: 1
|
536 |
+
train/TimeMask.t_center: !!python/tuple
|
537 |
+
- uniform
|
538 |
+
- 0.0
|
539 |
+
- 1.0
|
540 |
+
train/TimeMask.t_width: !!python/tuple
|
541 |
+
- const
|
542 |
+
- 0.025
|
543 |
+
|
544 |
+
train/TimeNoise.name: null
|
545 |
+
train/TimeNoise.prob: 1
|
546 |
+
train/TimeNoise.t_center: !!python/tuple
|
547 |
+
- uniform
|
548 |
+
- 0.0
|
549 |
+
- 1.0
|
550 |
+
train/TimeNoise.t_width: !!python/tuple
|
551 |
+
- const
|
552 |
+
- 0.025
|
553 |
+
|
554 |
+
train/VolumeChange.db: !!python/tuple
|
555 |
+
- uniform
|
556 |
+
- -12.0
|
557 |
+
- 0.0
|
558 |
+
train/VolumeChange.name: null
|
559 |
+
train/VolumeChange.prob: 1.0
|
560 |
+
|
561 |
+
train/VolumeNorm.db: !!python/tuple
|
562 |
+
- const
|
563 |
+
- -24
|
564 |
+
train/VolumeNorm.name: null
|
565 |
+
train/VolumeNorm.prob: 1.0
|
566 |
+
|
567 |
+
val/AudioDataset.aligned: false
|
568 |
+
val/AudioDataset.duration: 3.0
|
569 |
+
val/AudioDataset.loudness_cutoff: -40.0
|
570 |
+
val/AudioDataset.n_examples: 500
|
571 |
+
val/AudioDataset.num_channels: 1
|
572 |
+
val/AudioDataset.offset: null
|
573 |
+
val/AudioDataset.shuffle_loaders: false
|
574 |
+
val/AudioDataset.without_replacement: false
|
575 |
+
|
576 |
+
val/AudioLoader.sources:
|
577 |
+
- /media/CHONK/hugo/knower
|
578 |
+
|
579 |
+
val/BackgroundNoise.eq_amount: !!python/tuple
|
580 |
+
- const
|
581 |
+
- 1.0
|
582 |
+
val/BackgroundNoise.loudness_cutoff: null
|
583 |
+
val/BackgroundNoise.n_bands: 3
|
584 |
+
val/BackgroundNoise.name: null
|
585 |
+
val/BackgroundNoise.prob: 1.0
|
586 |
+
val/BackgroundNoise.snr: !!python/tuple
|
587 |
+
- uniform
|
588 |
+
- 10.0
|
589 |
+
- 30.0
|
590 |
+
val/BackgroundNoise.sources: null
|
591 |
+
val/BackgroundNoise.weights: null
|
592 |
+
|
593 |
+
val/BaseTransform.keys: []
|
594 |
+
val/BaseTransform.name: null
|
595 |
+
val/BaseTransform.prob: 1.0
|
596 |
+
|
597 |
+
val/ClippingDistortion.name: null
|
598 |
+
val/ClippingDistortion.perc: !!python/tuple
|
599 |
+
- uniform
|
600 |
+
- 0.0
|
601 |
+
- 0.1
|
602 |
+
val/ClippingDistortion.prob: 1.0
|
603 |
+
|
604 |
+
val/CorruptPhase.name: null
|
605 |
+
val/CorruptPhase.prob: 1
|
606 |
+
val/CorruptPhase.scale: !!python/tuple
|
607 |
+
- uniform
|
608 |
+
- 0
|
609 |
+
- 3.141592653589793
|
610 |
+
|
611 |
+
val/CrossTalk.loudness_cutoff: -40
|
612 |
+
val/CrossTalk.name: null
|
613 |
+
val/CrossTalk.prob: 1.0
|
614 |
+
val/CrossTalk.snr: !!python/tuple
|
615 |
+
- uniform
|
616 |
+
- 0.0
|
617 |
+
- 10.0
|
618 |
+
val/CrossTalk.sources: null
|
619 |
+
val/CrossTalk.weights: null
|
620 |
+
|
621 |
+
val/Equalizer.eq_amount: !!python/tuple
|
622 |
+
- const
|
623 |
+
- 1.0
|
624 |
+
val/Equalizer.n_bands: 6
|
625 |
+
val/Equalizer.name: null
|
626 |
+
val/Equalizer.prob: 1.0
|
627 |
+
|
628 |
+
val/FrequencyMask.f_center: !!python/tuple
|
629 |
+
- uniform
|
630 |
+
- 0.0
|
631 |
+
- 1.0
|
632 |
+
val/FrequencyMask.f_width: !!python/tuple
|
633 |
+
- const
|
634 |
+
- 0.1
|
635 |
+
val/FrequencyMask.name: null
|
636 |
+
val/FrequencyMask.prob: 1
|
637 |
+
|
638 |
+
val/FrequencyNoise.f_center: !!python/tuple
|
639 |
+
- uniform
|
640 |
+
- 0.0
|
641 |
+
- 1.0
|
642 |
+
val/FrequencyNoise.f_width: !!python/tuple
|
643 |
+
- const
|
644 |
+
- 0.1
|
645 |
+
val/FrequencyNoise.name: null
|
646 |
+
val/FrequencyNoise.prob: 1
|
647 |
+
|
648 |
+
val/GlobalVolumeNorm.db: !!python/tuple
|
649 |
+
- const
|
650 |
+
- -24
|
651 |
+
val/GlobalVolumeNorm.name: null
|
652 |
+
val/GlobalVolumeNorm.prob: 1.0
|
653 |
+
|
654 |
+
val/HighPass.cutoff: !!python/tuple
|
655 |
+
- choice
|
656 |
+
- - 50
|
657 |
+
- 100
|
658 |
+
- 250
|
659 |
+
- 500
|
660 |
+
- 1000
|
661 |
+
val/HighPass.name: null
|
662 |
+
val/HighPass.prob: 1
|
663 |
+
val/HighPass.zeros: 51
|
664 |
+
|
665 |
+
val/InvertPhase.name: null
|
666 |
+
val/InvertPhase.prob: 1
|
667 |
+
|
668 |
+
val/LowPass.cutoff: !!python/tuple
|
669 |
+
- choice
|
670 |
+
- - 4000
|
671 |
+
- 8000
|
672 |
+
- 16000
|
673 |
+
val/LowPass.name: null
|
674 |
+
val/LowPass.prob: 1
|
675 |
+
val/LowPass.zeros: 51
|
676 |
+
|
677 |
+
val/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
678 |
+
- uniform
|
679 |
+
- -10
|
680 |
+
- 10
|
681 |
+
val/MaskLowMagnitudes.name: null
|
682 |
+
val/MaskLowMagnitudes.prob: 1
|
683 |
+
|
684 |
+
val/MuLawQuantization.channels: !!python/tuple
|
685 |
+
- choice
|
686 |
+
- - 8
|
687 |
+
- 32
|
688 |
+
- 128
|
689 |
+
- 256
|
690 |
+
- 1024
|
691 |
+
val/MuLawQuantization.name: null
|
692 |
+
val/MuLawQuantization.prob: 1.0
|
693 |
+
|
694 |
+
val/NoiseFloor.db: !!python/tuple
|
695 |
+
- const
|
696 |
+
- -50.0
|
697 |
+
val/NoiseFloor.name: null
|
698 |
+
val/NoiseFloor.prob: 1.0
|
699 |
+
|
700 |
+
val/Quantization.channels: !!python/tuple
|
701 |
+
- choice
|
702 |
+
- - 8
|
703 |
+
- 32
|
704 |
+
- 128
|
705 |
+
- 256
|
706 |
+
- 1024
|
707 |
+
val/Quantization.name: null
|
708 |
+
val/Quantization.prob: 1.0
|
709 |
+
|
710 |
+
val/Repeat.n_repeat: 1
|
711 |
+
val/Repeat.name: null
|
712 |
+
val/Repeat.prob: 1.0
|
713 |
+
|
714 |
+
val/RepeatUpTo.max_repeat: 5
|
715 |
+
val/RepeatUpTo.name: null
|
716 |
+
val/RepeatUpTo.prob: 1.0
|
717 |
+
val/RepeatUpTo.weights: null
|
718 |
+
|
719 |
+
val/RescaleAudio.name: null
|
720 |
+
val/RescaleAudio.prob: 1
|
721 |
+
val/RescaleAudio.val: 1.0
|
722 |
+
|
723 |
+
val/RoomImpulseResponse.drr: !!python/tuple
|
724 |
+
- uniform
|
725 |
+
- 0.0
|
726 |
+
- 30.0
|
727 |
+
val/RoomImpulseResponse.duration: 1.0
|
728 |
+
val/RoomImpulseResponse.eq_amount: !!python/tuple
|
729 |
+
- const
|
730 |
+
- 1.0
|
731 |
+
val/RoomImpulseResponse.n_bands: 6
|
732 |
+
val/RoomImpulseResponse.name: null
|
733 |
+
val/RoomImpulseResponse.offset: 0.0
|
734 |
+
val/RoomImpulseResponse.prob: 1.0
|
735 |
+
val/RoomImpulseResponse.sources: null
|
736 |
+
val/RoomImpulseResponse.use_original_phase: false
|
737 |
+
val/RoomImpulseResponse.weights: null
|
738 |
+
|
739 |
+
val/ShiftPhase.name: null
|
740 |
+
val/ShiftPhase.prob: 1
|
741 |
+
val/ShiftPhase.shift: !!python/tuple
|
742 |
+
- uniform
|
743 |
+
- -3.141592653589793
|
744 |
+
- 3.141592653589793
|
745 |
+
|
746 |
+
val/Silence.name: null
|
747 |
+
val/Silence.prob: 0.1
|
748 |
+
|
749 |
+
val/Smoothing.name: null
|
750 |
+
val/Smoothing.prob: 1
|
751 |
+
val/Smoothing.window_length: !!python/tuple
|
752 |
+
- choice
|
753 |
+
- - 8
|
754 |
+
- 16
|
755 |
+
- 32
|
756 |
+
- 64
|
757 |
+
- 128
|
758 |
+
- 256
|
759 |
+
- 512
|
760 |
+
val/Smoothing.window_type: !!python/tuple
|
761 |
+
- const
|
762 |
+
- average
|
763 |
+
|
764 |
+
val/SpectralDenoising.denoise_amount: !!python/tuple
|
765 |
+
- uniform
|
766 |
+
- 0.8
|
767 |
+
- 1.0
|
768 |
+
val/SpectralDenoising.eq_amount: !!python/tuple
|
769 |
+
- const
|
770 |
+
- 1.0
|
771 |
+
val/SpectralDenoising.n_bands: 6
|
772 |
+
val/SpectralDenoising.n_freq: 3
|
773 |
+
val/SpectralDenoising.n_time: 5
|
774 |
+
val/SpectralDenoising.name: null
|
775 |
+
val/SpectralDenoising.nz_volume: -40
|
776 |
+
val/SpectralDenoising.prob: 1
|
777 |
+
|
778 |
+
val/TimeMask.name: null
|
779 |
+
val/TimeMask.prob: 1
|
780 |
+
val/TimeMask.t_center: !!python/tuple
|
781 |
+
- uniform
|
782 |
+
- 0.0
|
783 |
+
- 1.0
|
784 |
+
val/TimeMask.t_width: !!python/tuple
|
785 |
+
- const
|
786 |
+
- 0.025
|
787 |
+
|
788 |
+
val/TimeNoise.name: null
|
789 |
+
val/TimeNoise.prob: 1
|
790 |
+
val/TimeNoise.t_center: !!python/tuple
|
791 |
+
- uniform
|
792 |
+
- 0.0
|
793 |
+
- 1.0
|
794 |
+
val/TimeNoise.t_width: !!python/tuple
|
795 |
+
- const
|
796 |
+
- 0.025
|
797 |
+
|
798 |
+
val/VolumeChange.db: !!python/tuple
|
799 |
+
- uniform
|
800 |
+
- -12.0
|
801 |
+
- 0.0
|
802 |
+
val/VolumeChange.name: null
|
803 |
+
val/VolumeChange.prob: 1.0
|
804 |
+
|
805 |
+
val/VolumeNorm.db: !!python/tuple
|
806 |
+
- const
|
807 |
+
- -24
|
808 |
+
val/VolumeNorm.name: null
|
809 |
+
val/VolumeNorm.prob: 1.0
|
810 |
+
|
811 |
+
val_freq: 500
|
812 |
+
|
813 |
+
val_idx:
|
814 |
+
- 0
|
815 |
+
- 1
|
816 |
+
- 2
|
817 |
+
- 3
|
818 |
+
- 4
|
819 |
+
- 5
|
820 |
+
- 6
|
821 |
+
- 7
|
822 |
+
- 8
|
823 |
+
- 9
|
824 |
+
|
runs/knower/c2f/best/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fcf94cab2f8b30d063eb1c176b6e23ba41674d1db37183fc75250b09c536eec1
|
3 |
+
size 1111127537
|
runs/knower/c2f/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:34aaa7eeb26bf583637c5a1f4c7b7de23586ee60817bc9e87203442b5621699b
|
3 |
+
size 1111127537
|
runs/knower/c2f/model.txt
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
OptimizedModule(
|
2 |
+
277.753M params.
|
3 |
+
(_orig_mod): VampNet(
|
4 |
+
277.753M params.
|
5 |
+
(embedding): CodebookEmbedding(
|
6 |
+
0.145M params.
|
7 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
|
8 |
+
(out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
|
9 |
+
)
|
10 |
+
(transformer): TransformerStack(
|
11 |
+
264.481M params.
|
12 |
+
(layers): ModuleList(
|
13 |
+
(0): TransformerLayer(
|
14 |
+
16.531M params.
|
15 |
+
(norm_1): RMSNorm( 0.001M params.)
|
16 |
+
(film_1): FiLM( 0.000M params.)
|
17 |
+
(self_attn): MultiHeadRelativeAttention(
|
18 |
+
6.616M params.
|
19 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
21 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
22 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
23 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
24 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
25 |
+
)
|
26 |
+
(norm_3): RMSNorm( 0.001M params.)
|
27 |
+
(film_3): FiLM( 0.000M params.)
|
28 |
+
(feed_forward): FeedForward(
|
29 |
+
9.912M params.
|
30 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
31 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
32 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
33 |
+
(act): GatedGELU(
|
34 |
+
0.000M params.
|
35 |
+
(gelu): NewGELU( 0.000M params.)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
39 |
+
)
|
40 |
+
(1-15): 15 x TransformerLayer(
|
41 |
+
16.530M params.
|
42 |
+
(norm_1): RMSNorm( 0.001M params.)
|
43 |
+
(film_1): FiLM( 0.000M params.)
|
44 |
+
(self_attn): MultiHeadRelativeAttention(
|
45 |
+
6.615M params.
|
46 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
48 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
49 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
50 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
51 |
+
)
|
52 |
+
(norm_3): RMSNorm( 0.001M params.)
|
53 |
+
(film_3): FiLM( 0.000M params.)
|
54 |
+
(feed_forward): FeedForward(
|
55 |
+
9.912M params.
|
56 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
57 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
58 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
59 |
+
(act): GatedGELU(
|
60 |
+
0.000M params.
|
61 |
+
(gelu): NewGELU( 0.000M params.)
|
62 |
+
)
|
63 |
+
)
|
64 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
65 |
+
)
|
66 |
+
)
|
67 |
+
(norm): RMSNorm( 0.001M params.)
|
68 |
+
)
|
69 |
+
(classifier): SequentialWithFiLM(
|
70 |
+
13.128M params.
|
71 |
+
(layers): ModuleList(
|
72 |
+
(0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
|
73 |
+
)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
)
|
runs/knower/coarse/args.yml
ADDED
@@ -0,0 +1,824 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 10.0
|
14 |
+
AudioDataset.loudness_cutoff: -30.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: /data/
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
BackgroundNoise.eq_amount: !!python/tuple
|
33 |
+
- const
|
34 |
+
- 1.0
|
35 |
+
BackgroundNoise.loudness_cutoff: null
|
36 |
+
BackgroundNoise.n_bands: 3
|
37 |
+
BackgroundNoise.name: null
|
38 |
+
BackgroundNoise.prob: 1.0
|
39 |
+
BackgroundNoise.snr: !!python/tuple
|
40 |
+
- uniform
|
41 |
+
- 10.0
|
42 |
+
- 30.0
|
43 |
+
BackgroundNoise.sources: null
|
44 |
+
BackgroundNoise.weights: null
|
45 |
+
|
46 |
+
BaseTransform.keys: []
|
47 |
+
BaseTransform.name: null
|
48 |
+
BaseTransform.prob: 1.0
|
49 |
+
|
50 |
+
ClippingDistortion.name: null
|
51 |
+
ClippingDistortion.perc: !!python/tuple
|
52 |
+
- uniform
|
53 |
+
- 0.0
|
54 |
+
- 0.1
|
55 |
+
ClippingDistortion.prob: 1.0
|
56 |
+
|
57 |
+
CorruptPhase.name: null
|
58 |
+
CorruptPhase.prob: 1
|
59 |
+
CorruptPhase.scale: !!python/tuple
|
60 |
+
- uniform
|
61 |
+
- 0
|
62 |
+
- 3.141592653589793
|
63 |
+
|
64 |
+
CrossEntropyLoss.ignore_index: -100
|
65 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
66 |
+
CrossEntropyLoss.reduce: null
|
67 |
+
CrossEntropyLoss.reduction: mean
|
68 |
+
CrossEntropyLoss.size_average: null
|
69 |
+
|
70 |
+
CrossTalk.loudness_cutoff: -40
|
71 |
+
CrossTalk.name: null
|
72 |
+
CrossTalk.prob: 1.0
|
73 |
+
CrossTalk.snr: !!python/tuple
|
74 |
+
- uniform
|
75 |
+
- 0.0
|
76 |
+
- 10.0
|
77 |
+
CrossTalk.sources: null
|
78 |
+
CrossTalk.weights: null
|
79 |
+
|
80 |
+
Equalizer.eq_amount: !!python/tuple
|
81 |
+
- const
|
82 |
+
- 1.0
|
83 |
+
Equalizer.n_bands: 6
|
84 |
+
Equalizer.name: null
|
85 |
+
Equalizer.prob: 1.0
|
86 |
+
|
87 |
+
FrequencyMask.f_center: !!python/tuple
|
88 |
+
- uniform
|
89 |
+
- 0.0
|
90 |
+
- 1.0
|
91 |
+
FrequencyMask.f_width: !!python/tuple
|
92 |
+
- const
|
93 |
+
- 0.1
|
94 |
+
FrequencyMask.name: null
|
95 |
+
FrequencyMask.prob: 1
|
96 |
+
|
97 |
+
FrequencyNoise.f_center: !!python/tuple
|
98 |
+
- uniform
|
99 |
+
- 0.0
|
100 |
+
- 1.0
|
101 |
+
FrequencyNoise.f_width: !!python/tuple
|
102 |
+
- const
|
103 |
+
- 0.1
|
104 |
+
FrequencyNoise.name: null
|
105 |
+
FrequencyNoise.prob: 1
|
106 |
+
|
107 |
+
GlobalVolumeNorm.db: !!python/tuple
|
108 |
+
- const
|
109 |
+
- -24
|
110 |
+
GlobalVolumeNorm.name: null
|
111 |
+
GlobalVolumeNorm.prob: 1.0
|
112 |
+
|
113 |
+
HighPass.cutoff: !!python/tuple
|
114 |
+
- choice
|
115 |
+
- - 50
|
116 |
+
- 100
|
117 |
+
- 250
|
118 |
+
- 500
|
119 |
+
- 1000
|
120 |
+
HighPass.name: null
|
121 |
+
HighPass.prob: 1
|
122 |
+
HighPass.zeros: 51
|
123 |
+
|
124 |
+
InvertPhase.name: null
|
125 |
+
InvertPhase.prob: 1
|
126 |
+
|
127 |
+
LowPass.cutoff: !!python/tuple
|
128 |
+
- choice
|
129 |
+
- - 4000
|
130 |
+
- 8000
|
131 |
+
- 16000
|
132 |
+
LowPass.name: null
|
133 |
+
LowPass.prob: 1
|
134 |
+
LowPass.zeros: 51
|
135 |
+
|
136 |
+
MaskLowMagnitudes.db_cutoff: !!python/tuple
|
137 |
+
- uniform
|
138 |
+
- -10
|
139 |
+
- 10
|
140 |
+
MaskLowMagnitudes.name: null
|
141 |
+
MaskLowMagnitudes.prob: 1
|
142 |
+
|
143 |
+
MuLawQuantization.channels: !!python/tuple
|
144 |
+
- choice
|
145 |
+
- - 8
|
146 |
+
- 32
|
147 |
+
- 128
|
148 |
+
- 256
|
149 |
+
- 1024
|
150 |
+
MuLawQuantization.name: null
|
151 |
+
MuLawQuantization.prob: 1.0
|
152 |
+
|
153 |
+
NoamScheduler.d_model: 512
|
154 |
+
NoamScheduler.factor: 2.0
|
155 |
+
NoamScheduler.warmup: 500
|
156 |
+
|
157 |
+
NoiseFloor.db: !!python/tuple
|
158 |
+
- const
|
159 |
+
- -50.0
|
160 |
+
NoiseFloor.name: null
|
161 |
+
NoiseFloor.prob: 1.0
|
162 |
+
|
163 |
+
Quantization.channels: !!python/tuple
|
164 |
+
- choice
|
165 |
+
- - 8
|
166 |
+
- 32
|
167 |
+
- 128
|
168 |
+
- 256
|
169 |
+
- 1024
|
170 |
+
Quantization.name: null
|
171 |
+
Quantization.prob: 1.0
|
172 |
+
|
173 |
+
Repeat.n_repeat: 1
|
174 |
+
Repeat.name: null
|
175 |
+
Repeat.prob: 1.0
|
176 |
+
|
177 |
+
RepeatUpTo.max_repeat: 5
|
178 |
+
RepeatUpTo.name: null
|
179 |
+
RepeatUpTo.prob: 1.0
|
180 |
+
RepeatUpTo.weights: null
|
181 |
+
|
182 |
+
RescaleAudio.name: null
|
183 |
+
RescaleAudio.prob: 1
|
184 |
+
RescaleAudio.val: 1.0
|
185 |
+
|
186 |
+
RoomImpulseResponse.drr: !!python/tuple
|
187 |
+
- uniform
|
188 |
+
- 0.0
|
189 |
+
- 30.0
|
190 |
+
RoomImpulseResponse.duration: 1.0
|
191 |
+
RoomImpulseResponse.eq_amount: !!python/tuple
|
192 |
+
- const
|
193 |
+
- 1.0
|
194 |
+
RoomImpulseResponse.n_bands: 6
|
195 |
+
RoomImpulseResponse.name: null
|
196 |
+
RoomImpulseResponse.offset: 0.0
|
197 |
+
RoomImpulseResponse.prob: 1.0
|
198 |
+
RoomImpulseResponse.sources: null
|
199 |
+
RoomImpulseResponse.use_original_phase: false
|
200 |
+
RoomImpulseResponse.weights: null
|
201 |
+
|
202 |
+
ShiftPhase.name: null
|
203 |
+
ShiftPhase.prob: 1
|
204 |
+
ShiftPhase.shift: !!python/tuple
|
205 |
+
- uniform
|
206 |
+
- -3.141592653589793
|
207 |
+
- 3.141592653589793
|
208 |
+
|
209 |
+
Silence.name: null
|
210 |
+
Silence.prob: 0.1
|
211 |
+
|
212 |
+
Smoothing.name: null
|
213 |
+
Smoothing.prob: 1
|
214 |
+
Smoothing.window_length: !!python/tuple
|
215 |
+
- choice
|
216 |
+
- - 8
|
217 |
+
- 16
|
218 |
+
- 32
|
219 |
+
- 64
|
220 |
+
- 128
|
221 |
+
- 256
|
222 |
+
- 512
|
223 |
+
Smoothing.window_type: !!python/tuple
|
224 |
+
- const
|
225 |
+
- average
|
226 |
+
|
227 |
+
SpectralDenoising.denoise_amount: !!python/tuple
|
228 |
+
- uniform
|
229 |
+
- 0.8
|
230 |
+
- 1.0
|
231 |
+
SpectralDenoising.eq_amount: !!python/tuple
|
232 |
+
- const
|
233 |
+
- 1.0
|
234 |
+
SpectralDenoising.n_bands: 6
|
235 |
+
SpectralDenoising.n_freq: 3
|
236 |
+
SpectralDenoising.n_time: 5
|
237 |
+
SpectralDenoising.name: null
|
238 |
+
SpectralDenoising.nz_volume: -40
|
239 |
+
SpectralDenoising.prob: 1
|
240 |
+
|
241 |
+
TimeMask.name: null
|
242 |
+
TimeMask.prob: 1
|
243 |
+
TimeMask.t_center: !!python/tuple
|
244 |
+
- uniform
|
245 |
+
- 0.0
|
246 |
+
- 1.0
|
247 |
+
TimeMask.t_width: !!python/tuple
|
248 |
+
- const
|
249 |
+
- 0.025
|
250 |
+
|
251 |
+
TimeNoise.name: null
|
252 |
+
TimeNoise.prob: 1
|
253 |
+
TimeNoise.t_center: !!python/tuple
|
254 |
+
- uniform
|
255 |
+
- 0.0
|
256 |
+
- 1.0
|
257 |
+
TimeNoise.t_width: !!python/tuple
|
258 |
+
- const
|
259 |
+
- 0.025
|
260 |
+
|
261 |
+
VampNet.dropout: 0.1
|
262 |
+
VampNet.embedding_dim: 1280
|
263 |
+
VampNet.flash_attn: false
|
264 |
+
VampNet.latent_dim: 8
|
265 |
+
VampNet.n_codebooks: 4
|
266 |
+
VampNet.n_conditioning_codebooks: 0
|
267 |
+
VampNet.n_heads: 20
|
268 |
+
VampNet.n_layers: 20
|
269 |
+
VampNet.noise_mode: mask
|
270 |
+
VampNet.r_cond_dim: 0
|
271 |
+
VampNet.vocab_size: 1024
|
272 |
+
|
273 |
+
VolumeChange.db: !!python/tuple
|
274 |
+
- uniform
|
275 |
+
- -12.0
|
276 |
+
- 0.0
|
277 |
+
VolumeChange.name: null
|
278 |
+
VolumeChange.prob: 1.0
|
279 |
+
|
280 |
+
VolumeNorm.db: !!python/tuple
|
281 |
+
- const
|
282 |
+
- -24
|
283 |
+
VolumeNorm.name: null
|
284 |
+
VolumeNorm.prob: 1.0
|
285 |
+
|
286 |
+
amp: false
|
287 |
+
|
288 |
+
args.debug: true
|
289 |
+
args.load: conf/generated/knower/coarse.yml
|
290 |
+
args.save: null
|
291 |
+
|
292 |
+
batch_size: 6
|
293 |
+
|
294 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
295 |
+
|
296 |
+
fine_tune: true
|
297 |
+
|
298 |
+
fine_tune_checkpoint: ./models/vampnet/coarse.pth
|
299 |
+
|
300 |
+
grad_clip_val: 5.0
|
301 |
+
|
302 |
+
num_iters: 500000
|
303 |
+
|
304 |
+
num_workers: 7
|
305 |
+
|
306 |
+
resume: true
|
307 |
+
|
308 |
+
sample_freq: 1000
|
309 |
+
|
310 |
+
save_iters:
|
311 |
+
- 10000
|
312 |
+
- 20000
|
313 |
+
- 30000
|
314 |
+
- 40000
|
315 |
+
- 50000
|
316 |
+
|
317 |
+
save_path: ./runs/knower/coarse
|
318 |
+
|
319 |
+
seed: 0
|
320 |
+
|
321 |
+
tag: latest
|
322 |
+
|
323 |
+
train/AudioDataset.aligned: false
|
324 |
+
train/AudioDataset.duration: 10.0
|
325 |
+
train/AudioDataset.loudness_cutoff: -30.0
|
326 |
+
train/AudioDataset.n_examples: 100000000
|
327 |
+
train/AudioDataset.num_channels: 1
|
328 |
+
train/AudioDataset.offset: null
|
329 |
+
train/AudioDataset.shuffle_loaders: false
|
330 |
+
train/AudioDataset.without_replacement: false
|
331 |
+
|
332 |
+
train/AudioLoader.sources:
|
333 |
+
- /media/CHONK/hugo/knower
|
334 |
+
|
335 |
+
train/BackgroundNoise.eq_amount: !!python/tuple
|
336 |
+
- const
|
337 |
+
- 1.0
|
338 |
+
train/BackgroundNoise.loudness_cutoff: null
|
339 |
+
train/BackgroundNoise.n_bands: 3
|
340 |
+
train/BackgroundNoise.name: null
|
341 |
+
train/BackgroundNoise.prob: 1.0
|
342 |
+
train/BackgroundNoise.snr: !!python/tuple
|
343 |
+
- uniform
|
344 |
+
- 10.0
|
345 |
+
- 30.0
|
346 |
+
train/BackgroundNoise.sources: null
|
347 |
+
train/BackgroundNoise.weights: null
|
348 |
+
|
349 |
+
train/BaseTransform.keys: []
|
350 |
+
train/BaseTransform.name: null
|
351 |
+
train/BaseTransform.prob: 1.0
|
352 |
+
|
353 |
+
train/ClippingDistortion.name: null
|
354 |
+
train/ClippingDistortion.perc: !!python/tuple
|
355 |
+
- uniform
|
356 |
+
- 0.0
|
357 |
+
- 0.1
|
358 |
+
train/ClippingDistortion.prob: 1.0
|
359 |
+
|
360 |
+
train/CorruptPhase.name: null
|
361 |
+
train/CorruptPhase.prob: 1
|
362 |
+
train/CorruptPhase.scale: !!python/tuple
|
363 |
+
- uniform
|
364 |
+
- 0
|
365 |
+
- 3.141592653589793
|
366 |
+
|
367 |
+
train/CrossTalk.loudness_cutoff: -40
|
368 |
+
train/CrossTalk.name: null
|
369 |
+
train/CrossTalk.prob: 1.0
|
370 |
+
train/CrossTalk.snr: !!python/tuple
|
371 |
+
- uniform
|
372 |
+
- 0.0
|
373 |
+
- 10.0
|
374 |
+
train/CrossTalk.sources: null
|
375 |
+
train/CrossTalk.weights: null
|
376 |
+
|
377 |
+
train/Equalizer.eq_amount: !!python/tuple
|
378 |
+
- const
|
379 |
+
- 1.0
|
380 |
+
train/Equalizer.n_bands: 6
|
381 |
+
train/Equalizer.name: null
|
382 |
+
train/Equalizer.prob: 1.0
|
383 |
+
|
384 |
+
train/FrequencyMask.f_center: !!python/tuple
|
385 |
+
- uniform
|
386 |
+
- 0.0
|
387 |
+
- 1.0
|
388 |
+
train/FrequencyMask.f_width: !!python/tuple
|
389 |
+
- const
|
390 |
+
- 0.1
|
391 |
+
train/FrequencyMask.name: null
|
392 |
+
train/FrequencyMask.prob: 1
|
393 |
+
|
394 |
+
train/FrequencyNoise.f_center: !!python/tuple
|
395 |
+
- uniform
|
396 |
+
- 0.0
|
397 |
+
- 1.0
|
398 |
+
train/FrequencyNoise.f_width: !!python/tuple
|
399 |
+
- const
|
400 |
+
- 0.1
|
401 |
+
train/FrequencyNoise.name: null
|
402 |
+
train/FrequencyNoise.prob: 1
|
403 |
+
|
404 |
+
train/GlobalVolumeNorm.db: !!python/tuple
|
405 |
+
- const
|
406 |
+
- -24
|
407 |
+
train/GlobalVolumeNorm.name: null
|
408 |
+
train/GlobalVolumeNorm.prob: 1.0
|
409 |
+
|
410 |
+
train/HighPass.cutoff: !!python/tuple
|
411 |
+
- choice
|
412 |
+
- - 50
|
413 |
+
- 100
|
414 |
+
- 250
|
415 |
+
- 500
|
416 |
+
- 1000
|
417 |
+
train/HighPass.name: null
|
418 |
+
train/HighPass.prob: 1
|
419 |
+
train/HighPass.zeros: 51
|
420 |
+
|
421 |
+
train/InvertPhase.name: null
|
422 |
+
train/InvertPhase.prob: 1
|
423 |
+
|
424 |
+
train/LowPass.cutoff: !!python/tuple
|
425 |
+
- choice
|
426 |
+
- - 4000
|
427 |
+
- 8000
|
428 |
+
- 16000
|
429 |
+
train/LowPass.name: null
|
430 |
+
train/LowPass.prob: 1
|
431 |
+
train/LowPass.zeros: 51
|
432 |
+
|
433 |
+
train/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
434 |
+
- uniform
|
435 |
+
- -10
|
436 |
+
- 10
|
437 |
+
train/MaskLowMagnitudes.name: null
|
438 |
+
train/MaskLowMagnitudes.prob: 1
|
439 |
+
|
440 |
+
train/MuLawQuantization.channels: !!python/tuple
|
441 |
+
- choice
|
442 |
+
- - 8
|
443 |
+
- 32
|
444 |
+
- 128
|
445 |
+
- 256
|
446 |
+
- 1024
|
447 |
+
train/MuLawQuantization.name: null
|
448 |
+
train/MuLawQuantization.prob: 1.0
|
449 |
+
|
450 |
+
train/NoiseFloor.db: !!python/tuple
|
451 |
+
- const
|
452 |
+
- -50.0
|
453 |
+
train/NoiseFloor.name: null
|
454 |
+
train/NoiseFloor.prob: 1.0
|
455 |
+
|
456 |
+
train/Quantization.channels: !!python/tuple
|
457 |
+
- choice
|
458 |
+
- - 8
|
459 |
+
- 32
|
460 |
+
- 128
|
461 |
+
- 256
|
462 |
+
- 1024
|
463 |
+
train/Quantization.name: null
|
464 |
+
train/Quantization.prob: 1.0
|
465 |
+
|
466 |
+
train/Repeat.n_repeat: 1
|
467 |
+
train/Repeat.name: null
|
468 |
+
train/Repeat.prob: 1.0
|
469 |
+
|
470 |
+
train/RepeatUpTo.max_repeat: 5
|
471 |
+
train/RepeatUpTo.name: null
|
472 |
+
train/RepeatUpTo.prob: 1.0
|
473 |
+
train/RepeatUpTo.weights: null
|
474 |
+
|
475 |
+
train/RescaleAudio.name: null
|
476 |
+
train/RescaleAudio.prob: 1
|
477 |
+
train/RescaleAudio.val: 1.0
|
478 |
+
|
479 |
+
train/RoomImpulseResponse.drr: !!python/tuple
|
480 |
+
- uniform
|
481 |
+
- 0.0
|
482 |
+
- 30.0
|
483 |
+
train/RoomImpulseResponse.duration: 1.0
|
484 |
+
train/RoomImpulseResponse.eq_amount: !!python/tuple
|
485 |
+
- const
|
486 |
+
- 1.0
|
487 |
+
train/RoomImpulseResponse.n_bands: 6
|
488 |
+
train/RoomImpulseResponse.name: null
|
489 |
+
train/RoomImpulseResponse.offset: 0.0
|
490 |
+
train/RoomImpulseResponse.prob: 1.0
|
491 |
+
train/RoomImpulseResponse.sources: null
|
492 |
+
train/RoomImpulseResponse.use_original_phase: false
|
493 |
+
train/RoomImpulseResponse.weights: null
|
494 |
+
|
495 |
+
train/ShiftPhase.name: null
|
496 |
+
train/ShiftPhase.prob: 1
|
497 |
+
train/ShiftPhase.shift: !!python/tuple
|
498 |
+
- uniform
|
499 |
+
- -3.141592653589793
|
500 |
+
- 3.141592653589793
|
501 |
+
|
502 |
+
train/Silence.name: null
|
503 |
+
train/Silence.prob: 0.1
|
504 |
+
|
505 |
+
train/Smoothing.name: null
|
506 |
+
train/Smoothing.prob: 1
|
507 |
+
train/Smoothing.window_length: !!python/tuple
|
508 |
+
- choice
|
509 |
+
- - 8
|
510 |
+
- 16
|
511 |
+
- 32
|
512 |
+
- 64
|
513 |
+
- 128
|
514 |
+
- 256
|
515 |
+
- 512
|
516 |
+
train/Smoothing.window_type: !!python/tuple
|
517 |
+
- const
|
518 |
+
- average
|
519 |
+
|
520 |
+
train/SpectralDenoising.denoise_amount: !!python/tuple
|
521 |
+
- uniform
|
522 |
+
- 0.8
|
523 |
+
- 1.0
|
524 |
+
train/SpectralDenoising.eq_amount: !!python/tuple
|
525 |
+
- const
|
526 |
+
- 1.0
|
527 |
+
train/SpectralDenoising.n_bands: 6
|
528 |
+
train/SpectralDenoising.n_freq: 3
|
529 |
+
train/SpectralDenoising.n_time: 5
|
530 |
+
train/SpectralDenoising.name: null
|
531 |
+
train/SpectralDenoising.nz_volume: -40
|
532 |
+
train/SpectralDenoising.prob: 1
|
533 |
+
|
534 |
+
train/TimeMask.name: null
|
535 |
+
train/TimeMask.prob: 1
|
536 |
+
train/TimeMask.t_center: !!python/tuple
|
537 |
+
- uniform
|
538 |
+
- 0.0
|
539 |
+
- 1.0
|
540 |
+
train/TimeMask.t_width: !!python/tuple
|
541 |
+
- const
|
542 |
+
- 0.025
|
543 |
+
|
544 |
+
train/TimeNoise.name: null
|
545 |
+
train/TimeNoise.prob: 1
|
546 |
+
train/TimeNoise.t_center: !!python/tuple
|
547 |
+
- uniform
|
548 |
+
- 0.0
|
549 |
+
- 1.0
|
550 |
+
train/TimeNoise.t_width: !!python/tuple
|
551 |
+
- const
|
552 |
+
- 0.025
|
553 |
+
|
554 |
+
train/VolumeChange.db: !!python/tuple
|
555 |
+
- uniform
|
556 |
+
- -12.0
|
557 |
+
- 0.0
|
558 |
+
train/VolumeChange.name: null
|
559 |
+
train/VolumeChange.prob: 1.0
|
560 |
+
|
561 |
+
train/VolumeNorm.db: !!python/tuple
|
562 |
+
- const
|
563 |
+
- -24
|
564 |
+
train/VolumeNorm.name: null
|
565 |
+
train/VolumeNorm.prob: 1.0
|
566 |
+
|
567 |
+
val/AudioDataset.aligned: false
|
568 |
+
val/AudioDataset.duration: 10.0
|
569 |
+
val/AudioDataset.loudness_cutoff: -30.0
|
570 |
+
val/AudioDataset.n_examples: 500
|
571 |
+
val/AudioDataset.num_channels: 1
|
572 |
+
val/AudioDataset.offset: null
|
573 |
+
val/AudioDataset.shuffle_loaders: false
|
574 |
+
val/AudioDataset.without_replacement: false
|
575 |
+
|
576 |
+
val/AudioLoader.sources:
|
577 |
+
- /media/CHONK/hugo/knower
|
578 |
+
|
579 |
+
val/BackgroundNoise.eq_amount: !!python/tuple
|
580 |
+
- const
|
581 |
+
- 1.0
|
582 |
+
val/BackgroundNoise.loudness_cutoff: null
|
583 |
+
val/BackgroundNoise.n_bands: 3
|
584 |
+
val/BackgroundNoise.name: null
|
585 |
+
val/BackgroundNoise.prob: 1.0
|
586 |
+
val/BackgroundNoise.snr: !!python/tuple
|
587 |
+
- uniform
|
588 |
+
- 10.0
|
589 |
+
- 30.0
|
590 |
+
val/BackgroundNoise.sources: null
|
591 |
+
val/BackgroundNoise.weights: null
|
592 |
+
|
593 |
+
val/BaseTransform.keys: []
|
594 |
+
val/BaseTransform.name: null
|
595 |
+
val/BaseTransform.prob: 1.0
|
596 |
+
|
597 |
+
val/ClippingDistortion.name: null
|
598 |
+
val/ClippingDistortion.perc: !!python/tuple
|
599 |
+
- uniform
|
600 |
+
- 0.0
|
601 |
+
- 0.1
|
602 |
+
val/ClippingDistortion.prob: 1.0
|
603 |
+
|
604 |
+
val/CorruptPhase.name: null
|
605 |
+
val/CorruptPhase.prob: 1
|
606 |
+
val/CorruptPhase.scale: !!python/tuple
|
607 |
+
- uniform
|
608 |
+
- 0
|
609 |
+
- 3.141592653589793
|
610 |
+
|
611 |
+
val/CrossTalk.loudness_cutoff: -40
|
612 |
+
val/CrossTalk.name: null
|
613 |
+
val/CrossTalk.prob: 1.0
|
614 |
+
val/CrossTalk.snr: !!python/tuple
|
615 |
+
- uniform
|
616 |
+
- 0.0
|
617 |
+
- 10.0
|
618 |
+
val/CrossTalk.sources: null
|
619 |
+
val/CrossTalk.weights: null
|
620 |
+
|
621 |
+
val/Equalizer.eq_amount: !!python/tuple
|
622 |
+
- const
|
623 |
+
- 1.0
|
624 |
+
val/Equalizer.n_bands: 6
|
625 |
+
val/Equalizer.name: null
|
626 |
+
val/Equalizer.prob: 1.0
|
627 |
+
|
628 |
+
val/FrequencyMask.f_center: !!python/tuple
|
629 |
+
- uniform
|
630 |
+
- 0.0
|
631 |
+
- 1.0
|
632 |
+
val/FrequencyMask.f_width: !!python/tuple
|
633 |
+
- const
|
634 |
+
- 0.1
|
635 |
+
val/FrequencyMask.name: null
|
636 |
+
val/FrequencyMask.prob: 1
|
637 |
+
|
638 |
+
val/FrequencyNoise.f_center: !!python/tuple
|
639 |
+
- uniform
|
640 |
+
- 0.0
|
641 |
+
- 1.0
|
642 |
+
val/FrequencyNoise.f_width: !!python/tuple
|
643 |
+
- const
|
644 |
+
- 0.1
|
645 |
+
val/FrequencyNoise.name: null
|
646 |
+
val/FrequencyNoise.prob: 1
|
647 |
+
|
648 |
+
val/GlobalVolumeNorm.db: !!python/tuple
|
649 |
+
- const
|
650 |
+
- -24
|
651 |
+
val/GlobalVolumeNorm.name: null
|
652 |
+
val/GlobalVolumeNorm.prob: 1.0
|
653 |
+
|
654 |
+
val/HighPass.cutoff: !!python/tuple
|
655 |
+
- choice
|
656 |
+
- - 50
|
657 |
+
- 100
|
658 |
+
- 250
|
659 |
+
- 500
|
660 |
+
- 1000
|
661 |
+
val/HighPass.name: null
|
662 |
+
val/HighPass.prob: 1
|
663 |
+
val/HighPass.zeros: 51
|
664 |
+
|
665 |
+
val/InvertPhase.name: null
|
666 |
+
val/InvertPhase.prob: 1
|
667 |
+
|
668 |
+
val/LowPass.cutoff: !!python/tuple
|
669 |
+
- choice
|
670 |
+
- - 4000
|
671 |
+
- 8000
|
672 |
+
- 16000
|
673 |
+
val/LowPass.name: null
|
674 |
+
val/LowPass.prob: 1
|
675 |
+
val/LowPass.zeros: 51
|
676 |
+
|
677 |
+
val/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
678 |
+
- uniform
|
679 |
+
- -10
|
680 |
+
- 10
|
681 |
+
val/MaskLowMagnitudes.name: null
|
682 |
+
val/MaskLowMagnitudes.prob: 1
|
683 |
+
|
684 |
+
val/MuLawQuantization.channels: !!python/tuple
|
685 |
+
- choice
|
686 |
+
- - 8
|
687 |
+
- 32
|
688 |
+
- 128
|
689 |
+
- 256
|
690 |
+
- 1024
|
691 |
+
val/MuLawQuantization.name: null
|
692 |
+
val/MuLawQuantization.prob: 1.0
|
693 |
+
|
694 |
+
val/NoiseFloor.db: !!python/tuple
|
695 |
+
- const
|
696 |
+
- -50.0
|
697 |
+
val/NoiseFloor.name: null
|
698 |
+
val/NoiseFloor.prob: 1.0
|
699 |
+
|
700 |
+
val/Quantization.channels: !!python/tuple
|
701 |
+
- choice
|
702 |
+
- - 8
|
703 |
+
- 32
|
704 |
+
- 128
|
705 |
+
- 256
|
706 |
+
- 1024
|
707 |
+
val/Quantization.name: null
|
708 |
+
val/Quantization.prob: 1.0
|
709 |
+
|
710 |
+
val/Repeat.n_repeat: 1
|
711 |
+
val/Repeat.name: null
|
712 |
+
val/Repeat.prob: 1.0
|
713 |
+
|
714 |
+
val/RepeatUpTo.max_repeat: 5
|
715 |
+
val/RepeatUpTo.name: null
|
716 |
+
val/RepeatUpTo.prob: 1.0
|
717 |
+
val/RepeatUpTo.weights: null
|
718 |
+
|
719 |
+
val/RescaleAudio.name: null
|
720 |
+
val/RescaleAudio.prob: 1
|
721 |
+
val/RescaleAudio.val: 1.0
|
722 |
+
|
723 |
+
val/RoomImpulseResponse.drr: !!python/tuple
|
724 |
+
- uniform
|
725 |
+
- 0.0
|
726 |
+
- 30.0
|
727 |
+
val/RoomImpulseResponse.duration: 1.0
|
728 |
+
val/RoomImpulseResponse.eq_amount: !!python/tuple
|
729 |
+
- const
|
730 |
+
- 1.0
|
731 |
+
val/RoomImpulseResponse.n_bands: 6
|
732 |
+
val/RoomImpulseResponse.name: null
|
733 |
+
val/RoomImpulseResponse.offset: 0.0
|
734 |
+
val/RoomImpulseResponse.prob: 1.0
|
735 |
+
val/RoomImpulseResponse.sources: null
|
736 |
+
val/RoomImpulseResponse.use_original_phase: false
|
737 |
+
val/RoomImpulseResponse.weights: null
|
738 |
+
|
739 |
+
val/ShiftPhase.name: null
|
740 |
+
val/ShiftPhase.prob: 1
|
741 |
+
val/ShiftPhase.shift: !!python/tuple
|
742 |
+
- uniform
|
743 |
+
- -3.141592653589793
|
744 |
+
- 3.141592653589793
|
745 |
+
|
746 |
+
val/Silence.name: null
|
747 |
+
val/Silence.prob: 0.1
|
748 |
+
|
749 |
+
val/Smoothing.name: null
|
750 |
+
val/Smoothing.prob: 1
|
751 |
+
val/Smoothing.window_length: !!python/tuple
|
752 |
+
- choice
|
753 |
+
- - 8
|
754 |
+
- 16
|
755 |
+
- 32
|
756 |
+
- 64
|
757 |
+
- 128
|
758 |
+
- 256
|
759 |
+
- 512
|
760 |
+
val/Smoothing.window_type: !!python/tuple
|
761 |
+
- const
|
762 |
+
- average
|
763 |
+
|
764 |
+
val/SpectralDenoising.denoise_amount: !!python/tuple
|
765 |
+
- uniform
|
766 |
+
- 0.8
|
767 |
+
- 1.0
|
768 |
+
val/SpectralDenoising.eq_amount: !!python/tuple
|
769 |
+
- const
|
770 |
+
- 1.0
|
771 |
+
val/SpectralDenoising.n_bands: 6
|
772 |
+
val/SpectralDenoising.n_freq: 3
|
773 |
+
val/SpectralDenoising.n_time: 5
|
774 |
+
val/SpectralDenoising.name: null
|
775 |
+
val/SpectralDenoising.nz_volume: -40
|
776 |
+
val/SpectralDenoising.prob: 1
|
777 |
+
|
778 |
+
val/TimeMask.name: null
|
779 |
+
val/TimeMask.prob: 1
|
780 |
+
val/TimeMask.t_center: !!python/tuple
|
781 |
+
- uniform
|
782 |
+
- 0.0
|
783 |
+
- 1.0
|
784 |
+
val/TimeMask.t_width: !!python/tuple
|
785 |
+
- const
|
786 |
+
- 0.025
|
787 |
+
|
788 |
+
val/TimeNoise.name: null
|
789 |
+
val/TimeNoise.prob: 1
|
790 |
+
val/TimeNoise.t_center: !!python/tuple
|
791 |
+
- uniform
|
792 |
+
- 0.0
|
793 |
+
- 1.0
|
794 |
+
val/TimeNoise.t_width: !!python/tuple
|
795 |
+
- const
|
796 |
+
- 0.025
|
797 |
+
|
798 |
+
val/VolumeChange.db: !!python/tuple
|
799 |
+
- uniform
|
800 |
+
- -12.0
|
801 |
+
- 0.0
|
802 |
+
val/VolumeChange.name: null
|
803 |
+
val/VolumeChange.prob: 1.0
|
804 |
+
|
805 |
+
val/VolumeNorm.db: !!python/tuple
|
806 |
+
- const
|
807 |
+
- -24
|
808 |
+
val/VolumeNorm.name: null
|
809 |
+
val/VolumeNorm.prob: 1.0
|
810 |
+
|
811 |
+
val_freq: 500
|
812 |
+
|
813 |
+
val_idx:
|
814 |
+
- 0
|
815 |
+
- 1
|
816 |
+
- 2
|
817 |
+
- 3
|
818 |
+
- 4
|
819 |
+
- 5
|
820 |
+
- 6
|
821 |
+
- 7
|
822 |
+
- 8
|
823 |
+
- 9
|
824 |
+
|
runs/knower/coarse/best/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:cdf46139e0a9b6ff93f954f037a05f8dfcd574180ed1732d61abbe3c75c696b4
|
3 |
+
size 1343718241
|
runs/knower/coarse/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e11462551537ffe62fd3c579473ffe5da73d0149d9a956d8e3448ada9a8b85c0
|
3 |
+
size 1343718241
|
runs/knower/coarse/model.txt
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
OptimizedModule(
|
2 |
+
335.894M params.
|
3 |
+
(_orig_mod): VampNet(
|
4 |
+
335.894M params.
|
5 |
+
(embedding): CodebookEmbedding(
|
6 |
+
0.042M params.
|
7 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
|
8 |
+
(out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
|
9 |
+
)
|
10 |
+
(transformer): TransformerStack(
|
11 |
+
330.600M params.
|
12 |
+
(layers): ModuleList(
|
13 |
+
(0): TransformerLayer(
|
14 |
+
16.531M params.
|
15 |
+
(norm_1): RMSNorm( 0.001M params.)
|
16 |
+
(film_1): FiLM( 0.000M params.)
|
17 |
+
(self_attn): MultiHeadRelativeAttention(
|
18 |
+
6.616M params.
|
19 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
21 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
22 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
23 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
24 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
25 |
+
)
|
26 |
+
(norm_3): RMSNorm( 0.001M params.)
|
27 |
+
(film_3): FiLM( 0.000M params.)
|
28 |
+
(feed_forward): FeedForward(
|
29 |
+
9.912M params.
|
30 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
31 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
32 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
33 |
+
(act): GatedGELU(
|
34 |
+
0.000M params.
|
35 |
+
(gelu): NewGELU( 0.000M params.)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
39 |
+
)
|
40 |
+
(1-19): 19 x TransformerLayer(
|
41 |
+
16.530M params.
|
42 |
+
(norm_1): RMSNorm( 0.001M params.)
|
43 |
+
(film_1): FiLM( 0.000M params.)
|
44 |
+
(self_attn): MultiHeadRelativeAttention(
|
45 |
+
6.615M params.
|
46 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
48 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
49 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
50 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
51 |
+
)
|
52 |
+
(norm_3): RMSNorm( 0.001M params.)
|
53 |
+
(film_3): FiLM( 0.000M params.)
|
54 |
+
(feed_forward): FeedForward(
|
55 |
+
9.912M params.
|
56 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
57 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
58 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
59 |
+
(act): GatedGELU(
|
60 |
+
0.000M params.
|
61 |
+
(gelu): NewGELU( 0.000M params.)
|
62 |
+
)
|
63 |
+
)
|
64 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
65 |
+
)
|
66 |
+
)
|
67 |
+
(norm): RMSNorm( 0.001M params.)
|
68 |
+
)
|
69 |
+
(classifier): SequentialWithFiLM(
|
70 |
+
5.251M params.
|
71 |
+
(layers): ModuleList(
|
72 |
+
(0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
|
73 |
+
)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
)
|
runs/n64/c2f/args.yml
ADDED
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 3.0
|
14 |
+
AudioDataset.loudness_cutoff: -40.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: ''
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
CrossEntropyLoss.ignore_index: -100
|
33 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
34 |
+
CrossEntropyLoss.reduce: null
|
35 |
+
CrossEntropyLoss.reduction: mean
|
36 |
+
CrossEntropyLoss.size_average: null
|
37 |
+
|
38 |
+
NoamScheduler.d_model: 512
|
39 |
+
NoamScheduler.factor: 2.0
|
40 |
+
NoamScheduler.warmup: 500
|
41 |
+
|
42 |
+
VampNet.dropout: 0.1
|
43 |
+
VampNet.embedding_dim: 1280
|
44 |
+
VampNet.flash_attn: false
|
45 |
+
VampNet.latent_dim: 8
|
46 |
+
VampNet.n_codebooks: 14
|
47 |
+
VampNet.n_conditioning_codebooks: 4
|
48 |
+
VampNet.n_heads: 20
|
49 |
+
VampNet.n_layers: 16
|
50 |
+
VampNet.noise_mode: mask
|
51 |
+
VampNet.r_cond_dim: 0
|
52 |
+
VampNet.vocab_size: 1024
|
53 |
+
|
54 |
+
amp: false
|
55 |
+
|
56 |
+
args.debug: true
|
57 |
+
args.load: conf/generated/n64/c2f.yml
|
58 |
+
args.save: null
|
59 |
+
|
60 |
+
batch_size: 6
|
61 |
+
|
62 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
63 |
+
|
64 |
+
fine_tune: true
|
65 |
+
|
66 |
+
fine_tune_checkpoint: ./models/vampnet/c2f.pth
|
67 |
+
|
68 |
+
grad_clip_val: 5.0
|
69 |
+
|
70 |
+
num_iters: 500000
|
71 |
+
|
72 |
+
num_workers: 7
|
73 |
+
|
74 |
+
resume: false
|
75 |
+
|
76 |
+
sample_freq: 2000
|
77 |
+
|
78 |
+
save_iters:
|
79 |
+
- 2000
|
80 |
+
- 4000
|
81 |
+
- 10000
|
82 |
+
- 20000
|
83 |
+
- 40000
|
84 |
+
- 100000
|
85 |
+
|
86 |
+
save_path: ./runs/n64/c2f
|
87 |
+
|
88 |
+
seed: 0
|
89 |
+
|
90 |
+
tag: latest
|
91 |
+
|
92 |
+
train/AudioDataset.aligned: false
|
93 |
+
train/AudioDataset.duration: 3.0
|
94 |
+
train/AudioDataset.loudness_cutoff: -40.0
|
95 |
+
train/AudioDataset.n_examples: 100000000
|
96 |
+
train/AudioDataset.num_channels: 1
|
97 |
+
train/AudioDataset.offset: null
|
98 |
+
train/AudioDataset.shuffle_loaders: false
|
99 |
+
train/AudioDataset.without_replacement: false
|
100 |
+
|
101 |
+
train/AudioLoader.sources:
|
102 |
+
- data/salad-bowl/n64-jungle/n64-jungle-mix.wav
|
103 |
+
|
104 |
+
val/AudioDataset.aligned: false
|
105 |
+
val/AudioDataset.duration: 3.0
|
106 |
+
val/AudioDataset.loudness_cutoff: -40.0
|
107 |
+
val/AudioDataset.n_examples: 500
|
108 |
+
val/AudioDataset.num_channels: 1
|
109 |
+
val/AudioDataset.offset: null
|
110 |
+
val/AudioDataset.shuffle_loaders: false
|
111 |
+
val/AudioDataset.without_replacement: false
|
112 |
+
|
113 |
+
val/AudioLoader.sources:
|
114 |
+
- data/salad-bowl/n64-jungle/n64-jungle-mix.wav
|
115 |
+
|
116 |
+
val_freq: 1000
|
117 |
+
|
118 |
+
val_idx:
|
119 |
+
- 0
|
120 |
+
- 1
|
121 |
+
- 2
|
122 |
+
- 3
|
123 |
+
- 4
|
124 |
+
- 5
|
125 |
+
- 6
|
126 |
+
- 7
|
127 |
+
- 8
|
128 |
+
- 9
|
129 |
+
|
runs/n64/c2f/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6af65912cdf28c67af5a6bb146270f2f6e3a66f8ef831d6387b282796099eb9e
|
3 |
+
size 1111127537
|
runs/n64/c2f/model.txt
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
OptimizedModule(
|
2 |
+
277.753M params.
|
3 |
+
(_orig_mod): VampNet(
|
4 |
+
277.753M params.
|
5 |
+
(embedding): CodebookEmbedding(
|
6 |
+
0.145M params.
|
7 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
|
8 |
+
(out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
|
9 |
+
)
|
10 |
+
(transformer): TransformerStack(
|
11 |
+
264.481M params.
|
12 |
+
(layers): ModuleList(
|
13 |
+
(0): TransformerLayer(
|
14 |
+
16.531M params.
|
15 |
+
(norm_1): RMSNorm( 0.001M params.)
|
16 |
+
(film_1): FiLM( 0.000M params.)
|
17 |
+
(self_attn): MultiHeadRelativeAttention(
|
18 |
+
6.616M params.
|
19 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
21 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
22 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
23 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
24 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
25 |
+
)
|
26 |
+
(norm_3): RMSNorm( 0.001M params.)
|
27 |
+
(film_3): FiLM( 0.000M params.)
|
28 |
+
(feed_forward): FeedForward(
|
29 |
+
9.912M params.
|
30 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
31 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
32 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
33 |
+
(act): GatedGELU(
|
34 |
+
0.000M params.
|
35 |
+
(gelu): NewGELU( 0.000M params.)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
39 |
+
)
|
40 |
+
(1-15): 15 x TransformerLayer(
|
41 |
+
16.530M params.
|
42 |
+
(norm_1): RMSNorm( 0.001M params.)
|
43 |
+
(film_1): FiLM( 0.000M params.)
|
44 |
+
(self_attn): MultiHeadRelativeAttention(
|
45 |
+
6.615M params.
|
46 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
48 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
49 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
50 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
51 |
+
)
|
52 |
+
(norm_3): RMSNorm( 0.001M params.)
|
53 |
+
(film_3): FiLM( 0.000M params.)
|
54 |
+
(feed_forward): FeedForward(
|
55 |
+
9.912M params.
|
56 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
57 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
58 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
59 |
+
(act): GatedGELU(
|
60 |
+
0.000M params.
|
61 |
+
(gelu): NewGELU( 0.000M params.)
|
62 |
+
)
|
63 |
+
)
|
64 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
65 |
+
)
|
66 |
+
)
|
67 |
+
(norm): RMSNorm( 0.001M params.)
|
68 |
+
)
|
69 |
+
(classifier): SequentialWithFiLM(
|
70 |
+
13.128M params.
|
71 |
+
(layers): ModuleList(
|
72 |
+
(0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
|
73 |
+
)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
)
|
runs/n64/coarse/args.yml
ADDED
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 10.0
|
14 |
+
AudioDataset.loudness_cutoff: -30.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: ''
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
CrossEntropyLoss.ignore_index: -100
|
33 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
34 |
+
CrossEntropyLoss.reduce: null
|
35 |
+
CrossEntropyLoss.reduction: mean
|
36 |
+
CrossEntropyLoss.size_average: null
|
37 |
+
|
38 |
+
NoamScheduler.d_model: 512
|
39 |
+
NoamScheduler.factor: 2.0
|
40 |
+
NoamScheduler.warmup: 500
|
41 |
+
|
42 |
+
VampNet.dropout: 0.1
|
43 |
+
VampNet.embedding_dim: 1280
|
44 |
+
VampNet.flash_attn: false
|
45 |
+
VampNet.latent_dim: 8
|
46 |
+
VampNet.n_codebooks: 4
|
47 |
+
VampNet.n_conditioning_codebooks: 0
|
48 |
+
VampNet.n_heads: 20
|
49 |
+
VampNet.n_layers: 20
|
50 |
+
VampNet.noise_mode: mask
|
51 |
+
VampNet.r_cond_dim: 0
|
52 |
+
VampNet.vocab_size: 1024
|
53 |
+
|
54 |
+
amp: false
|
55 |
+
|
56 |
+
args.debug: true
|
57 |
+
args.load: conf/generated/n64/coarse.yml
|
58 |
+
args.save: null
|
59 |
+
|
60 |
+
batch_size: 6
|
61 |
+
|
62 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
63 |
+
|
64 |
+
fine_tune: true
|
65 |
+
|
66 |
+
fine_tune_checkpoint: ./models/vampnet/coarse.pth
|
67 |
+
|
68 |
+
grad_clip_val: 5.0
|
69 |
+
|
70 |
+
num_iters: 500000
|
71 |
+
|
72 |
+
num_workers: 7
|
73 |
+
|
74 |
+
resume: false
|
75 |
+
|
76 |
+
sample_freq: 2000
|
77 |
+
|
78 |
+
save_iters:
|
79 |
+
- 2000
|
80 |
+
- 4000
|
81 |
+
- 10000
|
82 |
+
- 20000
|
83 |
+
- 40000
|
84 |
+
- 100000
|
85 |
+
|
86 |
+
save_path: ./runs/n64/coarse
|
87 |
+
|
88 |
+
seed: 0
|
89 |
+
|
90 |
+
tag: latest
|
91 |
+
|
92 |
+
train/AudioDataset.aligned: false
|
93 |
+
train/AudioDataset.duration: 10.0
|
94 |
+
train/AudioDataset.loudness_cutoff: -30.0
|
95 |
+
train/AudioDataset.n_examples: 100000000
|
96 |
+
train/AudioDataset.num_channels: 1
|
97 |
+
train/AudioDataset.offset: null
|
98 |
+
train/AudioDataset.shuffle_loaders: false
|
99 |
+
train/AudioDataset.without_replacement: false
|
100 |
+
|
101 |
+
train/AudioLoader.sources:
|
102 |
+
- data/salad-bowl/n64-jungle/n64-jungle-mix.wav
|
103 |
+
|
104 |
+
val/AudioDataset.aligned: false
|
105 |
+
val/AudioDataset.duration: 10.0
|
106 |
+
val/AudioDataset.loudness_cutoff: -30.0
|
107 |
+
val/AudioDataset.n_examples: 500
|
108 |
+
val/AudioDataset.num_channels: 1
|
109 |
+
val/AudioDataset.offset: null
|
110 |
+
val/AudioDataset.shuffle_loaders: false
|
111 |
+
val/AudioDataset.without_replacement: false
|
112 |
+
|
113 |
+
val/AudioLoader.sources:
|
114 |
+
- data/salad-bowl/n64-jungle/n64-jungle-mix.wav
|
115 |
+
|
116 |
+
val_freq: 1000
|
117 |
+
|
118 |
+
val_idx:
|
119 |
+
- 0
|
120 |
+
- 1
|
121 |
+
- 2
|
122 |
+
- 3
|
123 |
+
- 4
|
124 |
+
- 5
|
125 |
+
- 6
|
126 |
+
- 7
|
127 |
+
- 8
|
128 |
+
- 9
|
129 |
+
|
runs/n64/coarse/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4d2d95c5ac4b80d62cffaf6e054f47b16fdef156ef567db6a6499faf801e67ab
|
3 |
+
size 1343718241
|
runs/n64/coarse/model.txt
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
OptimizedModule(
|
2 |
+
335.894M params.
|
3 |
+
(_orig_mod): VampNet(
|
4 |
+
335.894M params.
|
5 |
+
(embedding): CodebookEmbedding(
|
6 |
+
0.042M params.
|
7 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
|
8 |
+
(out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
|
9 |
+
)
|
10 |
+
(transformer): TransformerStack(
|
11 |
+
330.600M params.
|
12 |
+
(layers): ModuleList(
|
13 |
+
(0): TransformerLayer(
|
14 |
+
16.531M params.
|
15 |
+
(norm_1): RMSNorm( 0.001M params.)
|
16 |
+
(film_1): FiLM( 0.000M params.)
|
17 |
+
(self_attn): MultiHeadRelativeAttention(
|
18 |
+
6.616M params.
|
19 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
21 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
22 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
23 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
24 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
25 |
+
)
|
26 |
+
(norm_3): RMSNorm( 0.001M params.)
|
27 |
+
(film_3): FiLM( 0.000M params.)
|
28 |
+
(feed_forward): FeedForward(
|
29 |
+
9.912M params.
|
30 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
31 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
32 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
33 |
+
(act): GatedGELU(
|
34 |
+
0.000M params.
|
35 |
+
(gelu): NewGELU( 0.000M params.)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
39 |
+
)
|
40 |
+
(1-19): 19 x TransformerLayer(
|
41 |
+
16.530M params.
|
42 |
+
(norm_1): RMSNorm( 0.001M params.)
|
43 |
+
(film_1): FiLM( 0.000M params.)
|
44 |
+
(self_attn): MultiHeadRelativeAttention(
|
45 |
+
6.615M params.
|
46 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
48 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
49 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
50 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
51 |
+
)
|
52 |
+
(norm_3): RMSNorm( 0.001M params.)
|
53 |
+
(film_3): FiLM( 0.000M params.)
|
54 |
+
(feed_forward): FeedForward(
|
55 |
+
9.912M params.
|
56 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
57 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
58 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
59 |
+
(act): GatedGELU(
|
60 |
+
0.000M params.
|
61 |
+
(gelu): NewGELU( 0.000M params.)
|
62 |
+
)
|
63 |
+
)
|
64 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
65 |
+
)
|
66 |
+
)
|
67 |
+
(norm): RMSNorm( 0.001M params.)
|
68 |
+
)
|
69 |
+
(classifier): SequentialWithFiLM(
|
70 |
+
5.251M params.
|
71 |
+
(layers): ModuleList(
|
72 |
+
(0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
|
73 |
+
)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
)
|
runs/n64/n64/c2f/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6af65912cdf28c67af5a6bb146270f2f6e3a66f8ef831d6387b282796099eb9e
|
3 |
+
size 1111127537
|
runs/n64/n64/coarse/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4d2d95c5ac4b80d62cffaf6e054f47b16fdef156ef567db6a6499faf801e67ab
|
3 |
+
size 1343718241
|
runs/opera/coarse/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7cc5874ba4b168b002ea4219b75552cdacef27a7d1077c025bf7b197e464b1ba
|
3 |
+
size 1343718241
|
runs/orchestral/c2f/args.yml
ADDED
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 3.0
|
14 |
+
AudioDataset.loudness_cutoff: -40.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: ''
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
CrossEntropyLoss.ignore_index: -100
|
33 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
34 |
+
CrossEntropyLoss.reduce: null
|
35 |
+
CrossEntropyLoss.reduction: mean
|
36 |
+
CrossEntropyLoss.size_average: null
|
37 |
+
|
38 |
+
NoamScheduler.d_model: 512
|
39 |
+
NoamScheduler.factor: 2.0
|
40 |
+
NoamScheduler.warmup: 500
|
41 |
+
|
42 |
+
VampNet.dropout: 0.1
|
43 |
+
VampNet.embedding_dim: 1280
|
44 |
+
VampNet.flash_attn: false
|
45 |
+
VampNet.latent_dim: 8
|
46 |
+
VampNet.n_codebooks: 14
|
47 |
+
VampNet.n_conditioning_codebooks: 4
|
48 |
+
VampNet.n_heads: 20
|
49 |
+
VampNet.n_layers: 16
|
50 |
+
VampNet.noise_mode: mask
|
51 |
+
VampNet.r_cond_dim: 0
|
52 |
+
VampNet.vocab_size: 1024
|
53 |
+
|
54 |
+
amp: false
|
55 |
+
|
56 |
+
args.debug: true
|
57 |
+
args.load: conf/generated/orchestral/c2f.yml
|
58 |
+
args.save: null
|
59 |
+
|
60 |
+
batch_size: 6
|
61 |
+
|
62 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
63 |
+
|
64 |
+
fine_tune: true
|
65 |
+
|
66 |
+
fine_tune_checkpoint: ./models/vampnet/c2f.pth
|
67 |
+
|
68 |
+
grad_clip_val: 5.0
|
69 |
+
|
70 |
+
num_iters: 500000
|
71 |
+
|
72 |
+
num_workers: 7
|
73 |
+
|
74 |
+
resume: false
|
75 |
+
|
76 |
+
sample_freq: 2000
|
77 |
+
|
78 |
+
save_iters:
|
79 |
+
- 2000
|
80 |
+
- 4000
|
81 |
+
- 10000
|
82 |
+
- 20000
|
83 |
+
- 40000
|
84 |
+
- 100000
|
85 |
+
|
86 |
+
save_path: ./runs/orchestral/c2f
|
87 |
+
|
88 |
+
seed: 0
|
89 |
+
|
90 |
+
tag: latest
|
91 |
+
|
92 |
+
train/AudioDataset.aligned: false
|
93 |
+
train/AudioDataset.duration: 3.0
|
94 |
+
train/AudioDataset.loudness_cutoff: -40.0
|
95 |
+
train/AudioDataset.n_examples: 100000000
|
96 |
+
train/AudioDataset.num_channels: 1
|
97 |
+
train/AudioDataset.offset: null
|
98 |
+
train/AudioDataset.shuffle_loaders: false
|
99 |
+
train/AudioDataset.without_replacement: false
|
100 |
+
|
101 |
+
train/AudioLoader.sources:
|
102 |
+
- /media/CHONK/hugo/loras/salad-bowl/chicago-symphony-orchestra/
|
103 |
+
|
104 |
+
val/AudioDataset.aligned: false
|
105 |
+
val/AudioDataset.duration: 3.0
|
106 |
+
val/AudioDataset.loudness_cutoff: -40.0
|
107 |
+
val/AudioDataset.n_examples: 500
|
108 |
+
val/AudioDataset.num_channels: 1
|
109 |
+
val/AudioDataset.offset: null
|
110 |
+
val/AudioDataset.shuffle_loaders: false
|
111 |
+
val/AudioDataset.without_replacement: false
|
112 |
+
|
113 |
+
val/AudioLoader.sources:
|
114 |
+
- /media/CHONK/hugo/loras/salad-bowl/chicago-symphony-orchestra/
|
115 |
+
|
116 |
+
val_freq: 1000
|
117 |
+
|
118 |
+
val_idx:
|
119 |
+
- 0
|
120 |
+
- 1
|
121 |
+
- 2
|
122 |
+
- 3
|
123 |
+
- 4
|
124 |
+
- 5
|
125 |
+
- 6
|
126 |
+
- 7
|
127 |
+
- 8
|
128 |
+
- 9
|
129 |
+
|
runs/orchestral/c2f/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:58a0e9cb777bc5a91835a48e77510d18a049295eab3ff7f23537581c6b3d390f
|
3 |
+
size 1111127537
|
runs/orchestral/c2f/model.txt
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
OptimizedModule(
|
2 |
+
277.753M params.
|
3 |
+
(_orig_mod): VampNet(
|
4 |
+
277.753M params.
|
5 |
+
(embedding): CodebookEmbedding(
|
6 |
+
0.145M params.
|
7 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
|
8 |
+
(out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
|
9 |
+
)
|
10 |
+
(transformer): TransformerStack(
|
11 |
+
264.481M params.
|
12 |
+
(layers): ModuleList(
|
13 |
+
(0): TransformerLayer(
|
14 |
+
16.531M params.
|
15 |
+
(norm_1): RMSNorm( 0.001M params.)
|
16 |
+
(film_1): FiLM( 0.000M params.)
|
17 |
+
(self_attn): MultiHeadRelativeAttention(
|
18 |
+
6.616M params.
|
19 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
21 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
22 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
23 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
24 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
25 |
+
)
|
26 |
+
(norm_3): RMSNorm( 0.001M params.)
|
27 |
+
(film_3): FiLM( 0.000M params.)
|
28 |
+
(feed_forward): FeedForward(
|
29 |
+
9.912M params.
|
30 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
31 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
32 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
33 |
+
(act): GatedGELU(
|
34 |
+
0.000M params.
|
35 |
+
(gelu): NewGELU( 0.000M params.)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
39 |
+
)
|
40 |
+
(1-15): 15 x TransformerLayer(
|
41 |
+
16.530M params.
|
42 |
+
(norm_1): RMSNorm( 0.001M params.)
|
43 |
+
(film_1): FiLM( 0.000M params.)
|
44 |
+
(self_attn): MultiHeadRelativeAttention(
|
45 |
+
6.615M params.
|
46 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
48 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
49 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
50 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
51 |
+
)
|
52 |
+
(norm_3): RMSNorm( 0.001M params.)
|
53 |
+
(film_3): FiLM( 0.000M params.)
|
54 |
+
(feed_forward): FeedForward(
|
55 |
+
9.912M params.
|
56 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
57 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
58 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
59 |
+
(act): GatedGELU(
|
60 |
+
0.000M params.
|
61 |
+
(gelu): NewGELU( 0.000M params.)
|
62 |
+
)
|
63 |
+
)
|
64 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
65 |
+
)
|
66 |
+
)
|
67 |
+
(norm): RMSNorm( 0.001M params.)
|
68 |
+
)
|
69 |
+
(classifier): SequentialWithFiLM(
|
70 |
+
13.128M params.
|
71 |
+
(layers): ModuleList(
|
72 |
+
(0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
|
73 |
+
)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
)
|
runs/orchestral/coarse/args.yml
ADDED
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 10.0
|
14 |
+
AudioDataset.loudness_cutoff: -30.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: ''
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
CrossEntropyLoss.ignore_index: -100
|
33 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
34 |
+
CrossEntropyLoss.reduce: null
|
35 |
+
CrossEntropyLoss.reduction: mean
|
36 |
+
CrossEntropyLoss.size_average: null
|
37 |
+
|
38 |
+
NoamScheduler.d_model: 512
|
39 |
+
NoamScheduler.factor: 2.0
|
40 |
+
NoamScheduler.warmup: 500
|
41 |
+
|
42 |
+
VampNet.dropout: 0.1
|
43 |
+
VampNet.embedding_dim: 1280
|
44 |
+
VampNet.flash_attn: false
|
45 |
+
VampNet.latent_dim: 8
|
46 |
+
VampNet.n_codebooks: 4
|
47 |
+
VampNet.n_conditioning_codebooks: 0
|
48 |
+
VampNet.n_heads: 20
|
49 |
+
VampNet.n_layers: 20
|
50 |
+
VampNet.noise_mode: mask
|
51 |
+
VampNet.r_cond_dim: 0
|
52 |
+
VampNet.vocab_size: 1024
|
53 |
+
|
54 |
+
amp: false
|
55 |
+
|
56 |
+
args.debug: true
|
57 |
+
args.load: conf/generated/orchestral/coarse.yml
|
58 |
+
args.save: null
|
59 |
+
|
60 |
+
batch_size: 6
|
61 |
+
|
62 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
63 |
+
|
64 |
+
fine_tune: true
|
65 |
+
|
66 |
+
fine_tune_checkpoint: ./models/vampnet/coarse.pth
|
67 |
+
|
68 |
+
grad_clip_val: 5.0
|
69 |
+
|
70 |
+
num_iters: 500000
|
71 |
+
|
72 |
+
num_workers: 7
|
73 |
+
|
74 |
+
resume: false
|
75 |
+
|
76 |
+
sample_freq: 2000
|
77 |
+
|
78 |
+
save_iters:
|
79 |
+
- 2000
|
80 |
+
- 4000
|
81 |
+
- 10000
|
82 |
+
- 20000
|
83 |
+
- 40000
|
84 |
+
- 100000
|
85 |
+
|
86 |
+
save_path: ./runs/orchestral/coarse
|
87 |
+
|
88 |
+
seed: 0
|
89 |
+
|
90 |
+
tag: latest
|
91 |
+
|
92 |
+
train/AudioDataset.aligned: false
|
93 |
+
train/AudioDataset.duration: 10.0
|
94 |
+
train/AudioDataset.loudness_cutoff: -30.0
|
95 |
+
train/AudioDataset.n_examples: 100000000
|
96 |
+
train/AudioDataset.num_channels: 1
|
97 |
+
train/AudioDataset.offset: null
|
98 |
+
train/AudioDataset.shuffle_loaders: false
|
99 |
+
train/AudioDataset.without_replacement: false
|
100 |
+
|
101 |
+
train/AudioLoader.sources:
|
102 |
+
- /media/CHONK/hugo/loras/salad-bowl/chicago-symphony-orchestra/
|
103 |
+
|
104 |
+
val/AudioDataset.aligned: false
|
105 |
+
val/AudioDataset.duration: 10.0
|
106 |
+
val/AudioDataset.loudness_cutoff: -30.0
|
107 |
+
val/AudioDataset.n_examples: 500
|
108 |
+
val/AudioDataset.num_channels: 1
|
109 |
+
val/AudioDataset.offset: null
|
110 |
+
val/AudioDataset.shuffle_loaders: false
|
111 |
+
val/AudioDataset.without_replacement: false
|
112 |
+
|
113 |
+
val/AudioLoader.sources:
|
114 |
+
- /media/CHONK/hugo/loras/salad-bowl/chicago-symphony-orchestra/
|
115 |
+
|
116 |
+
val_freq: 1000
|
117 |
+
|
118 |
+
val_idx:
|
119 |
+
- 0
|
120 |
+
- 1
|
121 |
+
- 2
|
122 |
+
- 3
|
123 |
+
- 4
|
124 |
+
- 5
|
125 |
+
- 6
|
126 |
+
- 7
|
127 |
+
- 8
|
128 |
+
- 9
|
129 |
+
|
runs/orchestral/coarse/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:19699c048342df79196a2f558e66038561068b0d4790080990906194652b58bf
|
3 |
+
size 1343718241
|
runs/orchestral/coarse/model.txt
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
OptimizedModule(
|
2 |
+
335.894M params.
|
3 |
+
(_orig_mod): VampNet(
|
4 |
+
335.894M params.
|
5 |
+
(embedding): CodebookEmbedding(
|
6 |
+
0.042M params.
|
7 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
|
8 |
+
(out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
|
9 |
+
)
|
10 |
+
(transformer): TransformerStack(
|
11 |
+
330.600M params.
|
12 |
+
(layers): ModuleList(
|
13 |
+
(0): TransformerLayer(
|
14 |
+
16.531M params.
|
15 |
+
(norm_1): RMSNorm( 0.001M params.)
|
16 |
+
(film_1): FiLM( 0.000M params.)
|
17 |
+
(self_attn): MultiHeadRelativeAttention(
|
18 |
+
6.616M params.
|
19 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
21 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
22 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
23 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
24 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
25 |
+
)
|
26 |
+
(norm_3): RMSNorm( 0.001M params.)
|
27 |
+
(film_3): FiLM( 0.000M params.)
|
28 |
+
(feed_forward): FeedForward(
|
29 |
+
9.912M params.
|
30 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
31 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
32 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
33 |
+
(act): GatedGELU(
|
34 |
+
0.000M params.
|
35 |
+
(gelu): NewGELU( 0.000M params.)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
39 |
+
)
|
40 |
+
(1-19): 19 x TransformerLayer(
|
41 |
+
16.530M params.
|
42 |
+
(norm_1): RMSNorm( 0.001M params.)
|
43 |
+
(film_1): FiLM( 0.000M params.)
|
44 |
+
(self_attn): MultiHeadRelativeAttention(
|
45 |
+
6.615M params.
|
46 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
48 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
49 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
50 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
51 |
+
)
|
52 |
+
(norm_3): RMSNorm( 0.001M params.)
|
53 |
+
(film_3): FiLM( 0.000M params.)
|
54 |
+
(feed_forward): FeedForward(
|
55 |
+
9.912M params.
|
56 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
57 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
58 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
59 |
+
(act): GatedGELU(
|
60 |
+
0.000M params.
|
61 |
+
(gelu): NewGELU( 0.000M params.)
|
62 |
+
)
|
63 |
+
)
|
64 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
65 |
+
)
|
66 |
+
)
|
67 |
+
(norm): RMSNorm( 0.001M params.)
|
68 |
+
)
|
69 |
+
(classifier): SequentialWithFiLM(
|
70 |
+
5.251M params.
|
71 |
+
(layers): ModuleList(
|
72 |
+
(0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
|
73 |
+
)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
)
|
runs/soundrangers-v2-v1/c2f/args.yml
ADDED
@@ -0,0 +1,851 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 3.0
|
14 |
+
AudioDataset.loudness_cutoff: -40.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: ''
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
BackgroundNoise.eq_amount: !!python/tuple
|
33 |
+
- const
|
34 |
+
- 1.0
|
35 |
+
BackgroundNoise.loudness_cutoff: null
|
36 |
+
BackgroundNoise.n_bands: 3
|
37 |
+
BackgroundNoise.name: null
|
38 |
+
BackgroundNoise.prob: 1.0
|
39 |
+
BackgroundNoise.snr: !!python/tuple
|
40 |
+
- uniform
|
41 |
+
- 10.0
|
42 |
+
- 30.0
|
43 |
+
BackgroundNoise.sources: null
|
44 |
+
BackgroundNoise.weights: null
|
45 |
+
|
46 |
+
BaseTransform.keys: []
|
47 |
+
BaseTransform.name: null
|
48 |
+
BaseTransform.prob: 1.0
|
49 |
+
|
50 |
+
ClippingDistortion.name: null
|
51 |
+
ClippingDistortion.perc: !!python/tuple
|
52 |
+
- uniform
|
53 |
+
- 0.0
|
54 |
+
- 0.1
|
55 |
+
ClippingDistortion.prob: 1.0
|
56 |
+
|
57 |
+
CorruptPhase.name: null
|
58 |
+
CorruptPhase.prob: 1
|
59 |
+
CorruptPhase.scale: !!python/tuple
|
60 |
+
- uniform
|
61 |
+
- 0
|
62 |
+
- 3.141592653589793
|
63 |
+
|
64 |
+
CrossEntropyLoss.ignore_index: -100
|
65 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
66 |
+
CrossEntropyLoss.reduce: null
|
67 |
+
CrossEntropyLoss.reduction: mean
|
68 |
+
CrossEntropyLoss.size_average: null
|
69 |
+
|
70 |
+
CrossTalk.loudness_cutoff: -40
|
71 |
+
CrossTalk.name: null
|
72 |
+
CrossTalk.prob: 1.0
|
73 |
+
CrossTalk.snr: !!python/tuple
|
74 |
+
- uniform
|
75 |
+
- 0.0
|
76 |
+
- 10.0
|
77 |
+
CrossTalk.sources: null
|
78 |
+
CrossTalk.weights: null
|
79 |
+
|
80 |
+
Equalizer.eq_amount: !!python/tuple
|
81 |
+
- const
|
82 |
+
- 1.0
|
83 |
+
Equalizer.n_bands: 6
|
84 |
+
Equalizer.name: null
|
85 |
+
Equalizer.prob: 1.0
|
86 |
+
|
87 |
+
FrequencyMask.f_center: !!python/tuple
|
88 |
+
- uniform
|
89 |
+
- 0.0
|
90 |
+
- 1.0
|
91 |
+
FrequencyMask.f_width: !!python/tuple
|
92 |
+
- const
|
93 |
+
- 0.1
|
94 |
+
FrequencyMask.name: null
|
95 |
+
FrequencyMask.prob: 1
|
96 |
+
|
97 |
+
FrequencyNoise.f_center: !!python/tuple
|
98 |
+
- uniform
|
99 |
+
- 0.0
|
100 |
+
- 1.0
|
101 |
+
FrequencyNoise.f_width: !!python/tuple
|
102 |
+
- const
|
103 |
+
- 0.1
|
104 |
+
FrequencyNoise.name: null
|
105 |
+
FrequencyNoise.prob: 1
|
106 |
+
|
107 |
+
GlobalVolumeNorm.db: !!python/tuple
|
108 |
+
- const
|
109 |
+
- -24
|
110 |
+
GlobalVolumeNorm.name: null
|
111 |
+
GlobalVolumeNorm.prob: 1.0
|
112 |
+
|
113 |
+
HighPass.cutoff: !!python/tuple
|
114 |
+
- choice
|
115 |
+
- - 50
|
116 |
+
- 100
|
117 |
+
- 250
|
118 |
+
- 500
|
119 |
+
- 1000
|
120 |
+
HighPass.name: null
|
121 |
+
HighPass.prob: 1
|
122 |
+
HighPass.zeros: 51
|
123 |
+
|
124 |
+
InvertPhase.name: null
|
125 |
+
InvertPhase.prob: 1
|
126 |
+
|
127 |
+
LowPass.cutoff: !!python/tuple
|
128 |
+
- choice
|
129 |
+
- - 4000
|
130 |
+
- 8000
|
131 |
+
- 16000
|
132 |
+
LowPass.name: null
|
133 |
+
LowPass.prob: 1
|
134 |
+
LowPass.zeros: 51
|
135 |
+
|
136 |
+
MaskLowMagnitudes.db_cutoff: !!python/tuple
|
137 |
+
- uniform
|
138 |
+
- -10
|
139 |
+
- 10
|
140 |
+
MaskLowMagnitudes.name: null
|
141 |
+
MaskLowMagnitudes.prob: 1
|
142 |
+
|
143 |
+
MuLawQuantization.channels: !!python/tuple
|
144 |
+
- choice
|
145 |
+
- - 8
|
146 |
+
- 32
|
147 |
+
- 128
|
148 |
+
- 256
|
149 |
+
- 1024
|
150 |
+
MuLawQuantization.name: null
|
151 |
+
MuLawQuantization.prob: 1.0
|
152 |
+
|
153 |
+
NoamScheduler.d_model: 512
|
154 |
+
NoamScheduler.factor: 2.0
|
155 |
+
NoamScheduler.warmup: 500
|
156 |
+
|
157 |
+
NoiseFloor.db: !!python/tuple
|
158 |
+
- const
|
159 |
+
- -50.0
|
160 |
+
NoiseFloor.name: null
|
161 |
+
NoiseFloor.prob: 1.0
|
162 |
+
|
163 |
+
Quantization.channels: !!python/tuple
|
164 |
+
- choice
|
165 |
+
- - 8
|
166 |
+
- 32
|
167 |
+
- 128
|
168 |
+
- 256
|
169 |
+
- 1024
|
170 |
+
Quantization.name: null
|
171 |
+
Quantization.prob: 1.0
|
172 |
+
|
173 |
+
Repeat.n_repeat: 1
|
174 |
+
Repeat.name: null
|
175 |
+
Repeat.prob: 1.0
|
176 |
+
|
177 |
+
RepeatUpTo.max_repeat: 5
|
178 |
+
RepeatUpTo.name: null
|
179 |
+
RepeatUpTo.prob: 1.0
|
180 |
+
RepeatUpTo.weights: null
|
181 |
+
|
182 |
+
RescaleAudio.name: null
|
183 |
+
RescaleAudio.prob: 1
|
184 |
+
RescaleAudio.val: 1.0
|
185 |
+
|
186 |
+
RoomImpulseResponse.drr: !!python/tuple
|
187 |
+
- uniform
|
188 |
+
- 0.0
|
189 |
+
- 30.0
|
190 |
+
RoomImpulseResponse.duration: 1.0
|
191 |
+
RoomImpulseResponse.eq_amount: !!python/tuple
|
192 |
+
- const
|
193 |
+
- 1.0
|
194 |
+
RoomImpulseResponse.n_bands: 6
|
195 |
+
RoomImpulseResponse.name: null
|
196 |
+
RoomImpulseResponse.offset: 0.0
|
197 |
+
RoomImpulseResponse.prob: 1.0
|
198 |
+
RoomImpulseResponse.sources: null
|
199 |
+
RoomImpulseResponse.use_original_phase: false
|
200 |
+
RoomImpulseResponse.weights: null
|
201 |
+
|
202 |
+
ShiftPhase.name: null
|
203 |
+
ShiftPhase.prob: 1
|
204 |
+
ShiftPhase.shift: !!python/tuple
|
205 |
+
- uniform
|
206 |
+
- -3.141592653589793
|
207 |
+
- 3.141592653589793
|
208 |
+
|
209 |
+
Silence.name: null
|
210 |
+
Silence.prob: 0.1
|
211 |
+
|
212 |
+
Smoothing.name: null
|
213 |
+
Smoothing.prob: 1
|
214 |
+
Smoothing.window_length: !!python/tuple
|
215 |
+
- choice
|
216 |
+
- - 8
|
217 |
+
- 16
|
218 |
+
- 32
|
219 |
+
- 64
|
220 |
+
- 128
|
221 |
+
- 256
|
222 |
+
- 512
|
223 |
+
Smoothing.window_type: !!python/tuple
|
224 |
+
- const
|
225 |
+
- average
|
226 |
+
|
227 |
+
SpectralDenoising.denoise_amount: !!python/tuple
|
228 |
+
- uniform
|
229 |
+
- 0.8
|
230 |
+
- 1.0
|
231 |
+
SpectralDenoising.eq_amount: !!python/tuple
|
232 |
+
- const
|
233 |
+
- 1.0
|
234 |
+
SpectralDenoising.n_bands: 6
|
235 |
+
SpectralDenoising.n_freq: 3
|
236 |
+
SpectralDenoising.n_time: 5
|
237 |
+
SpectralDenoising.name: null
|
238 |
+
SpectralDenoising.nz_volume: -40
|
239 |
+
SpectralDenoising.prob: 1
|
240 |
+
|
241 |
+
TimeMask.name: null
|
242 |
+
TimeMask.prob: 1
|
243 |
+
TimeMask.t_center: !!python/tuple
|
244 |
+
- uniform
|
245 |
+
- 0.0
|
246 |
+
- 1.0
|
247 |
+
TimeMask.t_width: !!python/tuple
|
248 |
+
- const
|
249 |
+
- 0.025
|
250 |
+
|
251 |
+
TimeNoise.name: null
|
252 |
+
TimeNoise.prob: 1
|
253 |
+
TimeNoise.t_center: !!python/tuple
|
254 |
+
- uniform
|
255 |
+
- 0.0
|
256 |
+
- 1.0
|
257 |
+
TimeNoise.t_width: !!python/tuple
|
258 |
+
- const
|
259 |
+
- 0.025
|
260 |
+
|
261 |
+
VampNet.dropout: 0.1
|
262 |
+
VampNet.embedding_dim: 1280
|
263 |
+
VampNet.flash_attn: false
|
264 |
+
VampNet.latent_dim: 8
|
265 |
+
VampNet.n_codebooks: 14
|
266 |
+
VampNet.n_conditioning_codebooks: 4
|
267 |
+
VampNet.n_heads: 20
|
268 |
+
VampNet.n_layers: 16
|
269 |
+
VampNet.noise_mode: mask
|
270 |
+
VampNet.r_cond_dim: 0
|
271 |
+
VampNet.vocab_size: 1024
|
272 |
+
|
273 |
+
VolumeChange.db: !!python/tuple
|
274 |
+
- uniform
|
275 |
+
- -12.0
|
276 |
+
- 0.0
|
277 |
+
VolumeChange.name: null
|
278 |
+
VolumeChange.prob: 1.0
|
279 |
+
|
280 |
+
VolumeNorm.db: !!python/tuple
|
281 |
+
- const
|
282 |
+
- -24
|
283 |
+
VolumeNorm.name: null
|
284 |
+
VolumeNorm.prob: 1.0
|
285 |
+
|
286 |
+
amp: false
|
287 |
+
|
288 |
+
args.debug: true
|
289 |
+
args.load: conf/generated/soundrangers2/c2f.yml
|
290 |
+
args.save: null
|
291 |
+
|
292 |
+
batch_size: 6
|
293 |
+
|
294 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
295 |
+
|
296 |
+
fine_tune: true
|
297 |
+
|
298 |
+
fine_tune_checkpoint: ./models/vampnet/c2f.pth
|
299 |
+
|
300 |
+
grad_clip_val: 5.0
|
301 |
+
|
302 |
+
num_iters: 500000
|
303 |
+
|
304 |
+
num_workers: 7
|
305 |
+
|
306 |
+
resume: true
|
307 |
+
|
308 |
+
sample_freq: 2000
|
309 |
+
|
310 |
+
save_iters:
|
311 |
+
- 2000
|
312 |
+
- 4000
|
313 |
+
- 10000
|
314 |
+
- 20000
|
315 |
+
- 40000
|
316 |
+
- 100000
|
317 |
+
|
318 |
+
save_path: ./runs/soundrangers-v2/c2f
|
319 |
+
|
320 |
+
seed: 0
|
321 |
+
|
322 |
+
tag: latest
|
323 |
+
|
324 |
+
train/AudioDataset.aligned: false
|
325 |
+
train/AudioDataset.duration: 3.0
|
326 |
+
train/AudioDataset.loudness_cutoff: -40.0
|
327 |
+
train/AudioDataset.n_examples: 100000000
|
328 |
+
train/AudioDataset.num_channels: 1
|
329 |
+
train/AudioDataset.offset: null
|
330 |
+
train/AudioDataset.shuffle_loaders: false
|
331 |
+
train/AudioDataset.without_replacement: false
|
332 |
+
|
333 |
+
train/AudioLoader.sources:
|
334 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Complete
|
335 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Update 2018
|
336 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
|
337 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
|
338 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
|
339 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
|
340 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
|
341 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
|
342 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
|
343 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
|
344 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
|
345 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Bells
|
346 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
|
347 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
|
348 |
+
|
349 |
+
train/BackgroundNoise.eq_amount: !!python/tuple
|
350 |
+
- const
|
351 |
+
- 1.0
|
352 |
+
train/BackgroundNoise.loudness_cutoff: null
|
353 |
+
train/BackgroundNoise.n_bands: 3
|
354 |
+
train/BackgroundNoise.name: null
|
355 |
+
train/BackgroundNoise.prob: 1.0
|
356 |
+
train/BackgroundNoise.snr: !!python/tuple
|
357 |
+
- uniform
|
358 |
+
- 10.0
|
359 |
+
- 30.0
|
360 |
+
train/BackgroundNoise.sources: null
|
361 |
+
train/BackgroundNoise.weights: null
|
362 |
+
|
363 |
+
train/BaseTransform.keys: []
|
364 |
+
train/BaseTransform.name: null
|
365 |
+
train/BaseTransform.prob: 1.0
|
366 |
+
|
367 |
+
train/ClippingDistortion.name: null
|
368 |
+
train/ClippingDistortion.perc: !!python/tuple
|
369 |
+
- uniform
|
370 |
+
- 0.0
|
371 |
+
- 0.1
|
372 |
+
train/ClippingDistortion.prob: 1.0
|
373 |
+
|
374 |
+
train/CorruptPhase.name: null
|
375 |
+
train/CorruptPhase.prob: 1
|
376 |
+
train/CorruptPhase.scale: !!python/tuple
|
377 |
+
- uniform
|
378 |
+
- 0
|
379 |
+
- 3.141592653589793
|
380 |
+
|
381 |
+
train/CrossTalk.loudness_cutoff: -40
|
382 |
+
train/CrossTalk.name: null
|
383 |
+
train/CrossTalk.prob: 1.0
|
384 |
+
train/CrossTalk.snr: !!python/tuple
|
385 |
+
- uniform
|
386 |
+
- 0.0
|
387 |
+
- 10.0
|
388 |
+
train/CrossTalk.sources: null
|
389 |
+
train/CrossTalk.weights: null
|
390 |
+
|
391 |
+
train/Equalizer.eq_amount: !!python/tuple
|
392 |
+
- const
|
393 |
+
- 1.0
|
394 |
+
train/Equalizer.n_bands: 6
|
395 |
+
train/Equalizer.name: null
|
396 |
+
train/Equalizer.prob: 1.0
|
397 |
+
|
398 |
+
train/FrequencyMask.f_center: !!python/tuple
|
399 |
+
- uniform
|
400 |
+
- 0.0
|
401 |
+
- 1.0
|
402 |
+
train/FrequencyMask.f_width: !!python/tuple
|
403 |
+
- const
|
404 |
+
- 0.1
|
405 |
+
train/FrequencyMask.name: null
|
406 |
+
train/FrequencyMask.prob: 1
|
407 |
+
|
408 |
+
train/FrequencyNoise.f_center: !!python/tuple
|
409 |
+
- uniform
|
410 |
+
- 0.0
|
411 |
+
- 1.0
|
412 |
+
train/FrequencyNoise.f_width: !!python/tuple
|
413 |
+
- const
|
414 |
+
- 0.1
|
415 |
+
train/FrequencyNoise.name: null
|
416 |
+
train/FrequencyNoise.prob: 1
|
417 |
+
|
418 |
+
train/GlobalVolumeNorm.db: !!python/tuple
|
419 |
+
- const
|
420 |
+
- -24
|
421 |
+
train/GlobalVolumeNorm.name: null
|
422 |
+
train/GlobalVolumeNorm.prob: 1.0
|
423 |
+
|
424 |
+
train/HighPass.cutoff: !!python/tuple
|
425 |
+
- choice
|
426 |
+
- - 50
|
427 |
+
- 100
|
428 |
+
- 250
|
429 |
+
- 500
|
430 |
+
- 1000
|
431 |
+
train/HighPass.name: null
|
432 |
+
train/HighPass.prob: 1
|
433 |
+
train/HighPass.zeros: 51
|
434 |
+
|
435 |
+
train/InvertPhase.name: null
|
436 |
+
train/InvertPhase.prob: 1
|
437 |
+
|
438 |
+
train/LowPass.cutoff: !!python/tuple
|
439 |
+
- choice
|
440 |
+
- - 4000
|
441 |
+
- 8000
|
442 |
+
- 16000
|
443 |
+
train/LowPass.name: null
|
444 |
+
train/LowPass.prob: 1
|
445 |
+
train/LowPass.zeros: 51
|
446 |
+
|
447 |
+
train/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
448 |
+
- uniform
|
449 |
+
- -10
|
450 |
+
- 10
|
451 |
+
train/MaskLowMagnitudes.name: null
|
452 |
+
train/MaskLowMagnitudes.prob: 1
|
453 |
+
|
454 |
+
train/MuLawQuantization.channels: !!python/tuple
|
455 |
+
- choice
|
456 |
+
- - 8
|
457 |
+
- 32
|
458 |
+
- 128
|
459 |
+
- 256
|
460 |
+
- 1024
|
461 |
+
train/MuLawQuantization.name: null
|
462 |
+
train/MuLawQuantization.prob: 1.0
|
463 |
+
|
464 |
+
train/NoiseFloor.db: !!python/tuple
|
465 |
+
- const
|
466 |
+
- -50.0
|
467 |
+
train/NoiseFloor.name: null
|
468 |
+
train/NoiseFloor.prob: 1.0
|
469 |
+
|
470 |
+
train/Quantization.channels: !!python/tuple
|
471 |
+
- choice
|
472 |
+
- - 8
|
473 |
+
- 32
|
474 |
+
- 128
|
475 |
+
- 256
|
476 |
+
- 1024
|
477 |
+
train/Quantization.name: null
|
478 |
+
train/Quantization.prob: 1.0
|
479 |
+
|
480 |
+
train/Repeat.n_repeat: 1
|
481 |
+
train/Repeat.name: null
|
482 |
+
train/Repeat.prob: 1.0
|
483 |
+
|
484 |
+
train/RepeatUpTo.max_repeat: 5
|
485 |
+
train/RepeatUpTo.name: null
|
486 |
+
train/RepeatUpTo.prob: 1.0
|
487 |
+
train/RepeatUpTo.weights: null
|
488 |
+
|
489 |
+
train/RescaleAudio.name: null
|
490 |
+
train/RescaleAudio.prob: 1
|
491 |
+
train/RescaleAudio.val: 1.0
|
492 |
+
|
493 |
+
train/RoomImpulseResponse.drr: !!python/tuple
|
494 |
+
- uniform
|
495 |
+
- 0.0
|
496 |
+
- 30.0
|
497 |
+
train/RoomImpulseResponse.duration: 1.0
|
498 |
+
train/RoomImpulseResponse.eq_amount: !!python/tuple
|
499 |
+
- const
|
500 |
+
- 1.0
|
501 |
+
train/RoomImpulseResponse.n_bands: 6
|
502 |
+
train/RoomImpulseResponse.name: null
|
503 |
+
train/RoomImpulseResponse.offset: 0.0
|
504 |
+
train/RoomImpulseResponse.prob: 1.0
|
505 |
+
train/RoomImpulseResponse.sources: null
|
506 |
+
train/RoomImpulseResponse.use_original_phase: false
|
507 |
+
train/RoomImpulseResponse.weights: null
|
508 |
+
|
509 |
+
train/ShiftPhase.name: null
|
510 |
+
train/ShiftPhase.prob: 1
|
511 |
+
train/ShiftPhase.shift: !!python/tuple
|
512 |
+
- uniform
|
513 |
+
- -3.141592653589793
|
514 |
+
- 3.141592653589793
|
515 |
+
|
516 |
+
train/Silence.name: null
|
517 |
+
train/Silence.prob: 0.1
|
518 |
+
|
519 |
+
train/Smoothing.name: null
|
520 |
+
train/Smoothing.prob: 1
|
521 |
+
train/Smoothing.window_length: !!python/tuple
|
522 |
+
- choice
|
523 |
+
- - 8
|
524 |
+
- 16
|
525 |
+
- 32
|
526 |
+
- 64
|
527 |
+
- 128
|
528 |
+
- 256
|
529 |
+
- 512
|
530 |
+
train/Smoothing.window_type: !!python/tuple
|
531 |
+
- const
|
532 |
+
- average
|
533 |
+
|
534 |
+
train/SpectralDenoising.denoise_amount: !!python/tuple
|
535 |
+
- uniform
|
536 |
+
- 0.8
|
537 |
+
- 1.0
|
538 |
+
train/SpectralDenoising.eq_amount: !!python/tuple
|
539 |
+
- const
|
540 |
+
- 1.0
|
541 |
+
train/SpectralDenoising.n_bands: 6
|
542 |
+
train/SpectralDenoising.n_freq: 3
|
543 |
+
train/SpectralDenoising.n_time: 5
|
544 |
+
train/SpectralDenoising.name: null
|
545 |
+
train/SpectralDenoising.nz_volume: -40
|
546 |
+
train/SpectralDenoising.prob: 1
|
547 |
+
|
548 |
+
train/TimeMask.name: null
|
549 |
+
train/TimeMask.prob: 1
|
550 |
+
train/TimeMask.t_center: !!python/tuple
|
551 |
+
- uniform
|
552 |
+
- 0.0
|
553 |
+
- 1.0
|
554 |
+
train/TimeMask.t_width: !!python/tuple
|
555 |
+
- const
|
556 |
+
- 0.025
|
557 |
+
|
558 |
+
train/TimeNoise.name: null
|
559 |
+
train/TimeNoise.prob: 1
|
560 |
+
train/TimeNoise.t_center: !!python/tuple
|
561 |
+
- uniform
|
562 |
+
- 0.0
|
563 |
+
- 1.0
|
564 |
+
train/TimeNoise.t_width: !!python/tuple
|
565 |
+
- const
|
566 |
+
- 0.025
|
567 |
+
|
568 |
+
train/VolumeChange.db: !!python/tuple
|
569 |
+
- uniform
|
570 |
+
- -12.0
|
571 |
+
- 0.0
|
572 |
+
train/VolumeChange.name: null
|
573 |
+
train/VolumeChange.prob: 1.0
|
574 |
+
|
575 |
+
train/VolumeNorm.db: !!python/tuple
|
576 |
+
- const
|
577 |
+
- -24
|
578 |
+
train/VolumeNorm.name: null
|
579 |
+
train/VolumeNorm.prob: 1.0
|
580 |
+
|
581 |
+
val/AudioDataset.aligned: false
|
582 |
+
val/AudioDataset.duration: 3.0
|
583 |
+
val/AudioDataset.loudness_cutoff: -40.0
|
584 |
+
val/AudioDataset.n_examples: 500
|
585 |
+
val/AudioDataset.num_channels: 1
|
586 |
+
val/AudioDataset.offset: null
|
587 |
+
val/AudioDataset.shuffle_loaders: false
|
588 |
+
val/AudioDataset.without_replacement: false
|
589 |
+
|
590 |
+
val/AudioLoader.sources:
|
591 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Complete
|
592 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Update 2018
|
593 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
|
594 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
|
595 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
|
596 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
|
597 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
|
598 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
|
599 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
|
600 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
|
601 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
|
602 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Bells
|
603 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
|
604 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
|
605 |
+
|
606 |
+
val/BackgroundNoise.eq_amount: !!python/tuple
|
607 |
+
- const
|
608 |
+
- 1.0
|
609 |
+
val/BackgroundNoise.loudness_cutoff: null
|
610 |
+
val/BackgroundNoise.n_bands: 3
|
611 |
+
val/BackgroundNoise.name: null
|
612 |
+
val/BackgroundNoise.prob: 1.0
|
613 |
+
val/BackgroundNoise.snr: !!python/tuple
|
614 |
+
- uniform
|
615 |
+
- 10.0
|
616 |
+
- 30.0
|
617 |
+
val/BackgroundNoise.sources: null
|
618 |
+
val/BackgroundNoise.weights: null
|
619 |
+
|
620 |
+
val/BaseTransform.keys: []
|
621 |
+
val/BaseTransform.name: null
|
622 |
+
val/BaseTransform.prob: 1.0
|
623 |
+
|
624 |
+
val/ClippingDistortion.name: null
|
625 |
+
val/ClippingDistortion.perc: !!python/tuple
|
626 |
+
- uniform
|
627 |
+
- 0.0
|
628 |
+
- 0.1
|
629 |
+
val/ClippingDistortion.prob: 1.0
|
630 |
+
|
631 |
+
val/CorruptPhase.name: null
|
632 |
+
val/CorruptPhase.prob: 1
|
633 |
+
val/CorruptPhase.scale: !!python/tuple
|
634 |
+
- uniform
|
635 |
+
- 0
|
636 |
+
- 3.141592653589793
|
637 |
+
|
638 |
+
val/CrossTalk.loudness_cutoff: -40
|
639 |
+
val/CrossTalk.name: null
|
640 |
+
val/CrossTalk.prob: 1.0
|
641 |
+
val/CrossTalk.snr: !!python/tuple
|
642 |
+
- uniform
|
643 |
+
- 0.0
|
644 |
+
- 10.0
|
645 |
+
val/CrossTalk.sources: null
|
646 |
+
val/CrossTalk.weights: null
|
647 |
+
|
648 |
+
val/Equalizer.eq_amount: !!python/tuple
|
649 |
+
- const
|
650 |
+
- 1.0
|
651 |
+
val/Equalizer.n_bands: 6
|
652 |
+
val/Equalizer.name: null
|
653 |
+
val/Equalizer.prob: 1.0
|
654 |
+
|
655 |
+
val/FrequencyMask.f_center: !!python/tuple
|
656 |
+
- uniform
|
657 |
+
- 0.0
|
658 |
+
- 1.0
|
659 |
+
val/FrequencyMask.f_width: !!python/tuple
|
660 |
+
- const
|
661 |
+
- 0.1
|
662 |
+
val/FrequencyMask.name: null
|
663 |
+
val/FrequencyMask.prob: 1
|
664 |
+
|
665 |
+
val/FrequencyNoise.f_center: !!python/tuple
|
666 |
+
- uniform
|
667 |
+
- 0.0
|
668 |
+
- 1.0
|
669 |
+
val/FrequencyNoise.f_width: !!python/tuple
|
670 |
+
- const
|
671 |
+
- 0.1
|
672 |
+
val/FrequencyNoise.name: null
|
673 |
+
val/FrequencyNoise.prob: 1
|
674 |
+
|
675 |
+
val/GlobalVolumeNorm.db: !!python/tuple
|
676 |
+
- const
|
677 |
+
- -24
|
678 |
+
val/GlobalVolumeNorm.name: null
|
679 |
+
val/GlobalVolumeNorm.prob: 1.0
|
680 |
+
|
681 |
+
val/HighPass.cutoff: !!python/tuple
|
682 |
+
- choice
|
683 |
+
- - 50
|
684 |
+
- 100
|
685 |
+
- 250
|
686 |
+
- 500
|
687 |
+
- 1000
|
688 |
+
val/HighPass.name: null
|
689 |
+
val/HighPass.prob: 1
|
690 |
+
val/HighPass.zeros: 51
|
691 |
+
|
692 |
+
val/InvertPhase.name: null
|
693 |
+
val/InvertPhase.prob: 1
|
694 |
+
|
695 |
+
val/LowPass.cutoff: !!python/tuple
|
696 |
+
- choice
|
697 |
+
- - 4000
|
698 |
+
- 8000
|
699 |
+
- 16000
|
700 |
+
val/LowPass.name: null
|
701 |
+
val/LowPass.prob: 1
|
702 |
+
val/LowPass.zeros: 51
|
703 |
+
|
704 |
+
val/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
705 |
+
- uniform
|
706 |
+
- -10
|
707 |
+
- 10
|
708 |
+
val/MaskLowMagnitudes.name: null
|
709 |
+
val/MaskLowMagnitudes.prob: 1
|
710 |
+
|
711 |
+
val/MuLawQuantization.channels: !!python/tuple
|
712 |
+
- choice
|
713 |
+
- - 8
|
714 |
+
- 32
|
715 |
+
- 128
|
716 |
+
- 256
|
717 |
+
- 1024
|
718 |
+
val/MuLawQuantization.name: null
|
719 |
+
val/MuLawQuantization.prob: 1.0
|
720 |
+
|
721 |
+
val/NoiseFloor.db: !!python/tuple
|
722 |
+
- const
|
723 |
+
- -50.0
|
724 |
+
val/NoiseFloor.name: null
|
725 |
+
val/NoiseFloor.prob: 1.0
|
726 |
+
|
727 |
+
val/Quantization.channels: !!python/tuple
|
728 |
+
- choice
|
729 |
+
- - 8
|
730 |
+
- 32
|
731 |
+
- 128
|
732 |
+
- 256
|
733 |
+
- 1024
|
734 |
+
val/Quantization.name: null
|
735 |
+
val/Quantization.prob: 1.0
|
736 |
+
|
737 |
+
val/Repeat.n_repeat: 1
|
738 |
+
val/Repeat.name: null
|
739 |
+
val/Repeat.prob: 1.0
|
740 |
+
|
741 |
+
val/RepeatUpTo.max_repeat: 5
|
742 |
+
val/RepeatUpTo.name: null
|
743 |
+
val/RepeatUpTo.prob: 1.0
|
744 |
+
val/RepeatUpTo.weights: null
|
745 |
+
|
746 |
+
val/RescaleAudio.name: null
|
747 |
+
val/RescaleAudio.prob: 1
|
748 |
+
val/RescaleAudio.val: 1.0
|
749 |
+
|
750 |
+
val/RoomImpulseResponse.drr: !!python/tuple
|
751 |
+
- uniform
|
752 |
+
- 0.0
|
753 |
+
- 30.0
|
754 |
+
val/RoomImpulseResponse.duration: 1.0
|
755 |
+
val/RoomImpulseResponse.eq_amount: !!python/tuple
|
756 |
+
- const
|
757 |
+
- 1.0
|
758 |
+
val/RoomImpulseResponse.n_bands: 6
|
759 |
+
val/RoomImpulseResponse.name: null
|
760 |
+
val/RoomImpulseResponse.offset: 0.0
|
761 |
+
val/RoomImpulseResponse.prob: 1.0
|
762 |
+
val/RoomImpulseResponse.sources: null
|
763 |
+
val/RoomImpulseResponse.use_original_phase: false
|
764 |
+
val/RoomImpulseResponse.weights: null
|
765 |
+
|
766 |
+
val/ShiftPhase.name: null
|
767 |
+
val/ShiftPhase.prob: 1
|
768 |
+
val/ShiftPhase.shift: !!python/tuple
|
769 |
+
- uniform
|
770 |
+
- -3.141592653589793
|
771 |
+
- 3.141592653589793
|
772 |
+
|
773 |
+
val/Silence.name: null
|
774 |
+
val/Silence.prob: 0.1
|
775 |
+
|
776 |
+
val/Smoothing.name: null
|
777 |
+
val/Smoothing.prob: 1
|
778 |
+
val/Smoothing.window_length: !!python/tuple
|
779 |
+
- choice
|
780 |
+
- - 8
|
781 |
+
- 16
|
782 |
+
- 32
|
783 |
+
- 64
|
784 |
+
- 128
|
785 |
+
- 256
|
786 |
+
- 512
|
787 |
+
val/Smoothing.window_type: !!python/tuple
|
788 |
+
- const
|
789 |
+
- average
|
790 |
+
|
791 |
+
val/SpectralDenoising.denoise_amount: !!python/tuple
|
792 |
+
- uniform
|
793 |
+
- 0.8
|
794 |
+
- 1.0
|
795 |
+
val/SpectralDenoising.eq_amount: !!python/tuple
|
796 |
+
- const
|
797 |
+
- 1.0
|
798 |
+
val/SpectralDenoising.n_bands: 6
|
799 |
+
val/SpectralDenoising.n_freq: 3
|
800 |
+
val/SpectralDenoising.n_time: 5
|
801 |
+
val/SpectralDenoising.name: null
|
802 |
+
val/SpectralDenoising.nz_volume: -40
|
803 |
+
val/SpectralDenoising.prob: 1
|
804 |
+
|
805 |
+
val/TimeMask.name: null
|
806 |
+
val/TimeMask.prob: 1
|
807 |
+
val/TimeMask.t_center: !!python/tuple
|
808 |
+
- uniform
|
809 |
+
- 0.0
|
810 |
+
- 1.0
|
811 |
+
val/TimeMask.t_width: !!python/tuple
|
812 |
+
- const
|
813 |
+
- 0.025
|
814 |
+
|
815 |
+
val/TimeNoise.name: null
|
816 |
+
val/TimeNoise.prob: 1
|
817 |
+
val/TimeNoise.t_center: !!python/tuple
|
818 |
+
- uniform
|
819 |
+
- 0.0
|
820 |
+
- 1.0
|
821 |
+
val/TimeNoise.t_width: !!python/tuple
|
822 |
+
- const
|
823 |
+
- 0.025
|
824 |
+
|
825 |
+
val/VolumeChange.db: !!python/tuple
|
826 |
+
- uniform
|
827 |
+
- -12.0
|
828 |
+
- 0.0
|
829 |
+
val/VolumeChange.name: null
|
830 |
+
val/VolumeChange.prob: 1.0
|
831 |
+
|
832 |
+
val/VolumeNorm.db: !!python/tuple
|
833 |
+
- const
|
834 |
+
- -24
|
835 |
+
val/VolumeNorm.name: null
|
836 |
+
val/VolumeNorm.prob: 1.0
|
837 |
+
|
838 |
+
val_freq: 1000
|
839 |
+
|
840 |
+
val_idx:
|
841 |
+
- 0
|
842 |
+
- 1
|
843 |
+
- 2
|
844 |
+
- 3
|
845 |
+
- 4
|
846 |
+
- 5
|
847 |
+
- 6
|
848 |
+
- 7
|
849 |
+
- 8
|
850 |
+
- 9
|
851 |
+
|
runs/soundrangers-v2-v1/c2f/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:82d83c323601ef3ae23d574cde1f93539bb3f057451d3e0a495b562fcc96deaa
|
3 |
+
size 1111127537
|
runs/soundrangers-v2-v1/c2f/model.txt
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
VampNet(
|
2 |
+
277.753M params.
|
3 |
+
(embedding): CodebookEmbedding(
|
4 |
+
0.145M params.
|
5 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
|
6 |
+
(out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
|
7 |
+
)
|
8 |
+
(transformer): TransformerStack(
|
9 |
+
264.481M params.
|
10 |
+
(layers): ModuleList(
|
11 |
+
(0): TransformerLayer(
|
12 |
+
16.531M params.
|
13 |
+
(norm_1): RMSNorm( 0.001M params.)
|
14 |
+
(film_1): FiLM( 0.000M params.)
|
15 |
+
(self_attn): MultiHeadRelativeAttention(
|
16 |
+
6.616M params.
|
17 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
18 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
19 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
21 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
22 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
23 |
+
)
|
24 |
+
(norm_3): RMSNorm( 0.001M params.)
|
25 |
+
(film_3): FiLM( 0.000M params.)
|
26 |
+
(feed_forward): FeedForward(
|
27 |
+
9.912M params.
|
28 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
29 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
30 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
31 |
+
(act): GatedGELU(
|
32 |
+
0.000M params.
|
33 |
+
(gelu): NewGELU( 0.000M params.)
|
34 |
+
)
|
35 |
+
)
|
36 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
37 |
+
)
|
38 |
+
(1-15): 15 x TransformerLayer(
|
39 |
+
16.530M params.
|
40 |
+
(norm_1): RMSNorm( 0.001M params.)
|
41 |
+
(film_1): FiLM( 0.000M params.)
|
42 |
+
(self_attn): MultiHeadRelativeAttention(
|
43 |
+
6.615M params.
|
44 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
45 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
46 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
48 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
49 |
+
)
|
50 |
+
(norm_3): RMSNorm( 0.001M params.)
|
51 |
+
(film_3): FiLM( 0.000M params.)
|
52 |
+
(feed_forward): FeedForward(
|
53 |
+
9.912M params.
|
54 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
55 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
56 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
57 |
+
(act): GatedGELU(
|
58 |
+
0.000M params.
|
59 |
+
(gelu): NewGELU( 0.000M params.)
|
60 |
+
)
|
61 |
+
)
|
62 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
63 |
+
)
|
64 |
+
)
|
65 |
+
(norm): RMSNorm( 0.001M params.)
|
66 |
+
)
|
67 |
+
(classifier): SequentialWithFiLM(
|
68 |
+
13.128M params.
|
69 |
+
(layers): ModuleList(
|
70 |
+
(0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
|
71 |
+
)
|
72 |
+
)
|
73 |
+
)
|
runs/soundrangers-v2-v1/coarse/args.yml
ADDED
@@ -0,0 +1,851 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 10.0
|
14 |
+
AudioDataset.loudness_cutoff: -30.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: ''
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
BackgroundNoise.eq_amount: !!python/tuple
|
33 |
+
- const
|
34 |
+
- 1.0
|
35 |
+
BackgroundNoise.loudness_cutoff: null
|
36 |
+
BackgroundNoise.n_bands: 3
|
37 |
+
BackgroundNoise.name: null
|
38 |
+
BackgroundNoise.prob: 1.0
|
39 |
+
BackgroundNoise.snr: !!python/tuple
|
40 |
+
- uniform
|
41 |
+
- 10.0
|
42 |
+
- 30.0
|
43 |
+
BackgroundNoise.sources: null
|
44 |
+
BackgroundNoise.weights: null
|
45 |
+
|
46 |
+
BaseTransform.keys: []
|
47 |
+
BaseTransform.name: null
|
48 |
+
BaseTransform.prob: 1.0
|
49 |
+
|
50 |
+
ClippingDistortion.name: null
|
51 |
+
ClippingDistortion.perc: !!python/tuple
|
52 |
+
- uniform
|
53 |
+
- 0.0
|
54 |
+
- 0.1
|
55 |
+
ClippingDistortion.prob: 1.0
|
56 |
+
|
57 |
+
CorruptPhase.name: null
|
58 |
+
CorruptPhase.prob: 1
|
59 |
+
CorruptPhase.scale: !!python/tuple
|
60 |
+
- uniform
|
61 |
+
- 0
|
62 |
+
- 3.141592653589793
|
63 |
+
|
64 |
+
CrossEntropyLoss.ignore_index: -100
|
65 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
66 |
+
CrossEntropyLoss.reduce: null
|
67 |
+
CrossEntropyLoss.reduction: mean
|
68 |
+
CrossEntropyLoss.size_average: null
|
69 |
+
|
70 |
+
CrossTalk.loudness_cutoff: -40
|
71 |
+
CrossTalk.name: null
|
72 |
+
CrossTalk.prob: 1.0
|
73 |
+
CrossTalk.snr: !!python/tuple
|
74 |
+
- uniform
|
75 |
+
- 0.0
|
76 |
+
- 10.0
|
77 |
+
CrossTalk.sources: null
|
78 |
+
CrossTalk.weights: null
|
79 |
+
|
80 |
+
Equalizer.eq_amount: !!python/tuple
|
81 |
+
- const
|
82 |
+
- 1.0
|
83 |
+
Equalizer.n_bands: 6
|
84 |
+
Equalizer.name: null
|
85 |
+
Equalizer.prob: 1.0
|
86 |
+
|
87 |
+
FrequencyMask.f_center: !!python/tuple
|
88 |
+
- uniform
|
89 |
+
- 0.0
|
90 |
+
- 1.0
|
91 |
+
FrequencyMask.f_width: !!python/tuple
|
92 |
+
- const
|
93 |
+
- 0.1
|
94 |
+
FrequencyMask.name: null
|
95 |
+
FrequencyMask.prob: 1
|
96 |
+
|
97 |
+
FrequencyNoise.f_center: !!python/tuple
|
98 |
+
- uniform
|
99 |
+
- 0.0
|
100 |
+
- 1.0
|
101 |
+
FrequencyNoise.f_width: !!python/tuple
|
102 |
+
- const
|
103 |
+
- 0.1
|
104 |
+
FrequencyNoise.name: null
|
105 |
+
FrequencyNoise.prob: 1
|
106 |
+
|
107 |
+
GlobalVolumeNorm.db: !!python/tuple
|
108 |
+
- const
|
109 |
+
- -24
|
110 |
+
GlobalVolumeNorm.name: null
|
111 |
+
GlobalVolumeNorm.prob: 1.0
|
112 |
+
|
113 |
+
HighPass.cutoff: !!python/tuple
|
114 |
+
- choice
|
115 |
+
- - 50
|
116 |
+
- 100
|
117 |
+
- 250
|
118 |
+
- 500
|
119 |
+
- 1000
|
120 |
+
HighPass.name: null
|
121 |
+
HighPass.prob: 1
|
122 |
+
HighPass.zeros: 51
|
123 |
+
|
124 |
+
InvertPhase.name: null
|
125 |
+
InvertPhase.prob: 1
|
126 |
+
|
127 |
+
LowPass.cutoff: !!python/tuple
|
128 |
+
- choice
|
129 |
+
- - 4000
|
130 |
+
- 8000
|
131 |
+
- 16000
|
132 |
+
LowPass.name: null
|
133 |
+
LowPass.prob: 1
|
134 |
+
LowPass.zeros: 51
|
135 |
+
|
136 |
+
MaskLowMagnitudes.db_cutoff: !!python/tuple
|
137 |
+
- uniform
|
138 |
+
- -10
|
139 |
+
- 10
|
140 |
+
MaskLowMagnitudes.name: null
|
141 |
+
MaskLowMagnitudes.prob: 1
|
142 |
+
|
143 |
+
MuLawQuantization.channels: !!python/tuple
|
144 |
+
- choice
|
145 |
+
- - 8
|
146 |
+
- 32
|
147 |
+
- 128
|
148 |
+
- 256
|
149 |
+
- 1024
|
150 |
+
MuLawQuantization.name: null
|
151 |
+
MuLawQuantization.prob: 1.0
|
152 |
+
|
153 |
+
NoamScheduler.d_model: 512
|
154 |
+
NoamScheduler.factor: 2.0
|
155 |
+
NoamScheduler.warmup: 500
|
156 |
+
|
157 |
+
NoiseFloor.db: !!python/tuple
|
158 |
+
- const
|
159 |
+
- -50.0
|
160 |
+
NoiseFloor.name: null
|
161 |
+
NoiseFloor.prob: 1.0
|
162 |
+
|
163 |
+
Quantization.channels: !!python/tuple
|
164 |
+
- choice
|
165 |
+
- - 8
|
166 |
+
- 32
|
167 |
+
- 128
|
168 |
+
- 256
|
169 |
+
- 1024
|
170 |
+
Quantization.name: null
|
171 |
+
Quantization.prob: 1.0
|
172 |
+
|
173 |
+
Repeat.n_repeat: 1
|
174 |
+
Repeat.name: null
|
175 |
+
Repeat.prob: 1.0
|
176 |
+
|
177 |
+
RepeatUpTo.max_repeat: 5
|
178 |
+
RepeatUpTo.name: null
|
179 |
+
RepeatUpTo.prob: 1.0
|
180 |
+
RepeatUpTo.weights: null
|
181 |
+
|
182 |
+
RescaleAudio.name: null
|
183 |
+
RescaleAudio.prob: 1
|
184 |
+
RescaleAudio.val: 1.0
|
185 |
+
|
186 |
+
RoomImpulseResponse.drr: !!python/tuple
|
187 |
+
- uniform
|
188 |
+
- 0.0
|
189 |
+
- 30.0
|
190 |
+
RoomImpulseResponse.duration: 1.0
|
191 |
+
RoomImpulseResponse.eq_amount: !!python/tuple
|
192 |
+
- const
|
193 |
+
- 1.0
|
194 |
+
RoomImpulseResponse.n_bands: 6
|
195 |
+
RoomImpulseResponse.name: null
|
196 |
+
RoomImpulseResponse.offset: 0.0
|
197 |
+
RoomImpulseResponse.prob: 1.0
|
198 |
+
RoomImpulseResponse.sources: null
|
199 |
+
RoomImpulseResponse.use_original_phase: false
|
200 |
+
RoomImpulseResponse.weights: null
|
201 |
+
|
202 |
+
ShiftPhase.name: null
|
203 |
+
ShiftPhase.prob: 1
|
204 |
+
ShiftPhase.shift: !!python/tuple
|
205 |
+
- uniform
|
206 |
+
- -3.141592653589793
|
207 |
+
- 3.141592653589793
|
208 |
+
|
209 |
+
Silence.name: null
|
210 |
+
Silence.prob: 0.1
|
211 |
+
|
212 |
+
Smoothing.name: null
|
213 |
+
Smoothing.prob: 1
|
214 |
+
Smoothing.window_length: !!python/tuple
|
215 |
+
- choice
|
216 |
+
- - 8
|
217 |
+
- 16
|
218 |
+
- 32
|
219 |
+
- 64
|
220 |
+
- 128
|
221 |
+
- 256
|
222 |
+
- 512
|
223 |
+
Smoothing.window_type: !!python/tuple
|
224 |
+
- const
|
225 |
+
- average
|
226 |
+
|
227 |
+
SpectralDenoising.denoise_amount: !!python/tuple
|
228 |
+
- uniform
|
229 |
+
- 0.8
|
230 |
+
- 1.0
|
231 |
+
SpectralDenoising.eq_amount: !!python/tuple
|
232 |
+
- const
|
233 |
+
- 1.0
|
234 |
+
SpectralDenoising.n_bands: 6
|
235 |
+
SpectralDenoising.n_freq: 3
|
236 |
+
SpectralDenoising.n_time: 5
|
237 |
+
SpectralDenoising.name: null
|
238 |
+
SpectralDenoising.nz_volume: -40
|
239 |
+
SpectralDenoising.prob: 1
|
240 |
+
|
241 |
+
TimeMask.name: null
|
242 |
+
TimeMask.prob: 1
|
243 |
+
TimeMask.t_center: !!python/tuple
|
244 |
+
- uniform
|
245 |
+
- 0.0
|
246 |
+
- 1.0
|
247 |
+
TimeMask.t_width: !!python/tuple
|
248 |
+
- const
|
249 |
+
- 0.025
|
250 |
+
|
251 |
+
TimeNoise.name: null
|
252 |
+
TimeNoise.prob: 1
|
253 |
+
TimeNoise.t_center: !!python/tuple
|
254 |
+
- uniform
|
255 |
+
- 0.0
|
256 |
+
- 1.0
|
257 |
+
TimeNoise.t_width: !!python/tuple
|
258 |
+
- const
|
259 |
+
- 0.025
|
260 |
+
|
261 |
+
VampNet.dropout: 0.1
|
262 |
+
VampNet.embedding_dim: 1280
|
263 |
+
VampNet.flash_attn: false
|
264 |
+
VampNet.latent_dim: 8
|
265 |
+
VampNet.n_codebooks: 4
|
266 |
+
VampNet.n_conditioning_codebooks: 0
|
267 |
+
VampNet.n_heads: 20
|
268 |
+
VampNet.n_layers: 20
|
269 |
+
VampNet.noise_mode: mask
|
270 |
+
VampNet.r_cond_dim: 0
|
271 |
+
VampNet.vocab_size: 1024
|
272 |
+
|
273 |
+
VolumeChange.db: !!python/tuple
|
274 |
+
- uniform
|
275 |
+
- -12.0
|
276 |
+
- 0.0
|
277 |
+
VolumeChange.name: null
|
278 |
+
VolumeChange.prob: 1.0
|
279 |
+
|
280 |
+
VolumeNorm.db: !!python/tuple
|
281 |
+
- const
|
282 |
+
- -24
|
283 |
+
VolumeNorm.name: null
|
284 |
+
VolumeNorm.prob: 1.0
|
285 |
+
|
286 |
+
amp: false
|
287 |
+
|
288 |
+
args.debug: true
|
289 |
+
args.load: conf/generated/soundrangers2/coarse.yml
|
290 |
+
args.save: null
|
291 |
+
|
292 |
+
batch_size: 6
|
293 |
+
|
294 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
295 |
+
|
296 |
+
fine_tune: true
|
297 |
+
|
298 |
+
fine_tune_checkpoint: ./models/vampnet/coarse.pth
|
299 |
+
|
300 |
+
grad_clip_val: 5.0
|
301 |
+
|
302 |
+
num_iters: 500000
|
303 |
+
|
304 |
+
num_workers: 7
|
305 |
+
|
306 |
+
resume: true
|
307 |
+
|
308 |
+
sample_freq: 2000
|
309 |
+
|
310 |
+
save_iters:
|
311 |
+
- 2000
|
312 |
+
- 4000
|
313 |
+
- 10000
|
314 |
+
- 20000
|
315 |
+
- 40000
|
316 |
+
- 100000
|
317 |
+
|
318 |
+
save_path: ./runs/soundrangers-v2/coarse
|
319 |
+
|
320 |
+
seed: 0
|
321 |
+
|
322 |
+
tag: latest
|
323 |
+
|
324 |
+
train/AudioDataset.aligned: false
|
325 |
+
train/AudioDataset.duration: 10.0
|
326 |
+
train/AudioDataset.loudness_cutoff: -30.0
|
327 |
+
train/AudioDataset.n_examples: 100000000
|
328 |
+
train/AudioDataset.num_channels: 1
|
329 |
+
train/AudioDataset.offset: null
|
330 |
+
train/AudioDataset.shuffle_loaders: false
|
331 |
+
train/AudioDataset.without_replacement: false
|
332 |
+
|
333 |
+
train/AudioLoader.sources:
|
334 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Complete
|
335 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Update 2018
|
336 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
|
337 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
|
338 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
|
339 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
|
340 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
|
341 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
|
342 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
|
343 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
|
344 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
|
345 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Bells
|
346 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
|
347 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
|
348 |
+
|
349 |
+
train/BackgroundNoise.eq_amount: !!python/tuple
|
350 |
+
- const
|
351 |
+
- 1.0
|
352 |
+
train/BackgroundNoise.loudness_cutoff: null
|
353 |
+
train/BackgroundNoise.n_bands: 3
|
354 |
+
train/BackgroundNoise.name: null
|
355 |
+
train/BackgroundNoise.prob: 1.0
|
356 |
+
train/BackgroundNoise.snr: !!python/tuple
|
357 |
+
- uniform
|
358 |
+
- 10.0
|
359 |
+
- 30.0
|
360 |
+
train/BackgroundNoise.sources: null
|
361 |
+
train/BackgroundNoise.weights: null
|
362 |
+
|
363 |
+
train/BaseTransform.keys: []
|
364 |
+
train/BaseTransform.name: null
|
365 |
+
train/BaseTransform.prob: 1.0
|
366 |
+
|
367 |
+
train/ClippingDistortion.name: null
|
368 |
+
train/ClippingDistortion.perc: !!python/tuple
|
369 |
+
- uniform
|
370 |
+
- 0.0
|
371 |
+
- 0.1
|
372 |
+
train/ClippingDistortion.prob: 1.0
|
373 |
+
|
374 |
+
train/CorruptPhase.name: null
|
375 |
+
train/CorruptPhase.prob: 1
|
376 |
+
train/CorruptPhase.scale: !!python/tuple
|
377 |
+
- uniform
|
378 |
+
- 0
|
379 |
+
- 3.141592653589793
|
380 |
+
|
381 |
+
train/CrossTalk.loudness_cutoff: -40
|
382 |
+
train/CrossTalk.name: null
|
383 |
+
train/CrossTalk.prob: 1.0
|
384 |
+
train/CrossTalk.snr: !!python/tuple
|
385 |
+
- uniform
|
386 |
+
- 0.0
|
387 |
+
- 10.0
|
388 |
+
train/CrossTalk.sources: null
|
389 |
+
train/CrossTalk.weights: null
|
390 |
+
|
391 |
+
train/Equalizer.eq_amount: !!python/tuple
|
392 |
+
- const
|
393 |
+
- 1.0
|
394 |
+
train/Equalizer.n_bands: 6
|
395 |
+
train/Equalizer.name: null
|
396 |
+
train/Equalizer.prob: 1.0
|
397 |
+
|
398 |
+
train/FrequencyMask.f_center: !!python/tuple
|
399 |
+
- uniform
|
400 |
+
- 0.0
|
401 |
+
- 1.0
|
402 |
+
train/FrequencyMask.f_width: !!python/tuple
|
403 |
+
- const
|
404 |
+
- 0.1
|
405 |
+
train/FrequencyMask.name: null
|
406 |
+
train/FrequencyMask.prob: 1
|
407 |
+
|
408 |
+
train/FrequencyNoise.f_center: !!python/tuple
|
409 |
+
- uniform
|
410 |
+
- 0.0
|
411 |
+
- 1.0
|
412 |
+
train/FrequencyNoise.f_width: !!python/tuple
|
413 |
+
- const
|
414 |
+
- 0.1
|
415 |
+
train/FrequencyNoise.name: null
|
416 |
+
train/FrequencyNoise.prob: 1
|
417 |
+
|
418 |
+
train/GlobalVolumeNorm.db: !!python/tuple
|
419 |
+
- const
|
420 |
+
- -24
|
421 |
+
train/GlobalVolumeNorm.name: null
|
422 |
+
train/GlobalVolumeNorm.prob: 1.0
|
423 |
+
|
424 |
+
train/HighPass.cutoff: !!python/tuple
|
425 |
+
- choice
|
426 |
+
- - 50
|
427 |
+
- 100
|
428 |
+
- 250
|
429 |
+
- 500
|
430 |
+
- 1000
|
431 |
+
train/HighPass.name: null
|
432 |
+
train/HighPass.prob: 1
|
433 |
+
train/HighPass.zeros: 51
|
434 |
+
|
435 |
+
train/InvertPhase.name: null
|
436 |
+
train/InvertPhase.prob: 1
|
437 |
+
|
438 |
+
train/LowPass.cutoff: !!python/tuple
|
439 |
+
- choice
|
440 |
+
- - 4000
|
441 |
+
- 8000
|
442 |
+
- 16000
|
443 |
+
train/LowPass.name: null
|
444 |
+
train/LowPass.prob: 1
|
445 |
+
train/LowPass.zeros: 51
|
446 |
+
|
447 |
+
train/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
448 |
+
- uniform
|
449 |
+
- -10
|
450 |
+
- 10
|
451 |
+
train/MaskLowMagnitudes.name: null
|
452 |
+
train/MaskLowMagnitudes.prob: 1
|
453 |
+
|
454 |
+
train/MuLawQuantization.channels: !!python/tuple
|
455 |
+
- choice
|
456 |
+
- - 8
|
457 |
+
- 32
|
458 |
+
- 128
|
459 |
+
- 256
|
460 |
+
- 1024
|
461 |
+
train/MuLawQuantization.name: null
|
462 |
+
train/MuLawQuantization.prob: 1.0
|
463 |
+
|
464 |
+
train/NoiseFloor.db: !!python/tuple
|
465 |
+
- const
|
466 |
+
- -50.0
|
467 |
+
train/NoiseFloor.name: null
|
468 |
+
train/NoiseFloor.prob: 1.0
|
469 |
+
|
470 |
+
train/Quantization.channels: !!python/tuple
|
471 |
+
- choice
|
472 |
+
- - 8
|
473 |
+
- 32
|
474 |
+
- 128
|
475 |
+
- 256
|
476 |
+
- 1024
|
477 |
+
train/Quantization.name: null
|
478 |
+
train/Quantization.prob: 1.0
|
479 |
+
|
480 |
+
train/Repeat.n_repeat: 1
|
481 |
+
train/Repeat.name: null
|
482 |
+
train/Repeat.prob: 1.0
|
483 |
+
|
484 |
+
train/RepeatUpTo.max_repeat: 5
|
485 |
+
train/RepeatUpTo.name: null
|
486 |
+
train/RepeatUpTo.prob: 1.0
|
487 |
+
train/RepeatUpTo.weights: null
|
488 |
+
|
489 |
+
train/RescaleAudio.name: null
|
490 |
+
train/RescaleAudio.prob: 1
|
491 |
+
train/RescaleAudio.val: 1.0
|
492 |
+
|
493 |
+
train/RoomImpulseResponse.drr: !!python/tuple
|
494 |
+
- uniform
|
495 |
+
- 0.0
|
496 |
+
- 30.0
|
497 |
+
train/RoomImpulseResponse.duration: 1.0
|
498 |
+
train/RoomImpulseResponse.eq_amount: !!python/tuple
|
499 |
+
- const
|
500 |
+
- 1.0
|
501 |
+
train/RoomImpulseResponse.n_bands: 6
|
502 |
+
train/RoomImpulseResponse.name: null
|
503 |
+
train/RoomImpulseResponse.offset: 0.0
|
504 |
+
train/RoomImpulseResponse.prob: 1.0
|
505 |
+
train/RoomImpulseResponse.sources: null
|
506 |
+
train/RoomImpulseResponse.use_original_phase: false
|
507 |
+
train/RoomImpulseResponse.weights: null
|
508 |
+
|
509 |
+
train/ShiftPhase.name: null
|
510 |
+
train/ShiftPhase.prob: 1
|
511 |
+
train/ShiftPhase.shift: !!python/tuple
|
512 |
+
- uniform
|
513 |
+
- -3.141592653589793
|
514 |
+
- 3.141592653589793
|
515 |
+
|
516 |
+
train/Silence.name: null
|
517 |
+
train/Silence.prob: 0.1
|
518 |
+
|
519 |
+
train/Smoothing.name: null
|
520 |
+
train/Smoothing.prob: 1
|
521 |
+
train/Smoothing.window_length: !!python/tuple
|
522 |
+
- choice
|
523 |
+
- - 8
|
524 |
+
- 16
|
525 |
+
- 32
|
526 |
+
- 64
|
527 |
+
- 128
|
528 |
+
- 256
|
529 |
+
- 512
|
530 |
+
train/Smoothing.window_type: !!python/tuple
|
531 |
+
- const
|
532 |
+
- average
|
533 |
+
|
534 |
+
train/SpectralDenoising.denoise_amount: !!python/tuple
|
535 |
+
- uniform
|
536 |
+
- 0.8
|
537 |
+
- 1.0
|
538 |
+
train/SpectralDenoising.eq_amount: !!python/tuple
|
539 |
+
- const
|
540 |
+
- 1.0
|
541 |
+
train/SpectralDenoising.n_bands: 6
|
542 |
+
train/SpectralDenoising.n_freq: 3
|
543 |
+
train/SpectralDenoising.n_time: 5
|
544 |
+
train/SpectralDenoising.name: null
|
545 |
+
train/SpectralDenoising.nz_volume: -40
|
546 |
+
train/SpectralDenoising.prob: 1
|
547 |
+
|
548 |
+
train/TimeMask.name: null
|
549 |
+
train/TimeMask.prob: 1
|
550 |
+
train/TimeMask.t_center: !!python/tuple
|
551 |
+
- uniform
|
552 |
+
- 0.0
|
553 |
+
- 1.0
|
554 |
+
train/TimeMask.t_width: !!python/tuple
|
555 |
+
- const
|
556 |
+
- 0.025
|
557 |
+
|
558 |
+
train/TimeNoise.name: null
|
559 |
+
train/TimeNoise.prob: 1
|
560 |
+
train/TimeNoise.t_center: !!python/tuple
|
561 |
+
- uniform
|
562 |
+
- 0.0
|
563 |
+
- 1.0
|
564 |
+
train/TimeNoise.t_width: !!python/tuple
|
565 |
+
- const
|
566 |
+
- 0.025
|
567 |
+
|
568 |
+
train/VolumeChange.db: !!python/tuple
|
569 |
+
- uniform
|
570 |
+
- -12.0
|
571 |
+
- 0.0
|
572 |
+
train/VolumeChange.name: null
|
573 |
+
train/VolumeChange.prob: 1.0
|
574 |
+
|
575 |
+
train/VolumeNorm.db: !!python/tuple
|
576 |
+
- const
|
577 |
+
- -24
|
578 |
+
train/VolumeNorm.name: null
|
579 |
+
train/VolumeNorm.prob: 1.0
|
580 |
+
|
581 |
+
val/AudioDataset.aligned: false
|
582 |
+
val/AudioDataset.duration: 10.0
|
583 |
+
val/AudioDataset.loudness_cutoff: -30.0
|
584 |
+
val/AudioDataset.n_examples: 500
|
585 |
+
val/AudioDataset.num_channels: 1
|
586 |
+
val/AudioDataset.offset: null
|
587 |
+
val/AudioDataset.shuffle_loaders: false
|
588 |
+
val/AudioDataset.without_replacement: false
|
589 |
+
|
590 |
+
val/AudioLoader.sources:
|
591 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Complete
|
592 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Update 2018
|
593 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
|
594 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
|
595 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
|
596 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
|
597 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
|
598 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
|
599 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
|
600 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
|
601 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
|
602 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Bells
|
603 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
|
604 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
|
605 |
+
|
606 |
+
val/BackgroundNoise.eq_amount: !!python/tuple
|
607 |
+
- const
|
608 |
+
- 1.0
|
609 |
+
val/BackgroundNoise.loudness_cutoff: null
|
610 |
+
val/BackgroundNoise.n_bands: 3
|
611 |
+
val/BackgroundNoise.name: null
|
612 |
+
val/BackgroundNoise.prob: 1.0
|
613 |
+
val/BackgroundNoise.snr: !!python/tuple
|
614 |
+
- uniform
|
615 |
+
- 10.0
|
616 |
+
- 30.0
|
617 |
+
val/BackgroundNoise.sources: null
|
618 |
+
val/BackgroundNoise.weights: null
|
619 |
+
|
620 |
+
val/BaseTransform.keys: []
|
621 |
+
val/BaseTransform.name: null
|
622 |
+
val/BaseTransform.prob: 1.0
|
623 |
+
|
624 |
+
val/ClippingDistortion.name: null
|
625 |
+
val/ClippingDistortion.perc: !!python/tuple
|
626 |
+
- uniform
|
627 |
+
- 0.0
|
628 |
+
- 0.1
|
629 |
+
val/ClippingDistortion.prob: 1.0
|
630 |
+
|
631 |
+
val/CorruptPhase.name: null
|
632 |
+
val/CorruptPhase.prob: 1
|
633 |
+
val/CorruptPhase.scale: !!python/tuple
|
634 |
+
- uniform
|
635 |
+
- 0
|
636 |
+
- 3.141592653589793
|
637 |
+
|
638 |
+
val/CrossTalk.loudness_cutoff: -40
|
639 |
+
val/CrossTalk.name: null
|
640 |
+
val/CrossTalk.prob: 1.0
|
641 |
+
val/CrossTalk.snr: !!python/tuple
|
642 |
+
- uniform
|
643 |
+
- 0.0
|
644 |
+
- 10.0
|
645 |
+
val/CrossTalk.sources: null
|
646 |
+
val/CrossTalk.weights: null
|
647 |
+
|
648 |
+
val/Equalizer.eq_amount: !!python/tuple
|
649 |
+
- const
|
650 |
+
- 1.0
|
651 |
+
val/Equalizer.n_bands: 6
|
652 |
+
val/Equalizer.name: null
|
653 |
+
val/Equalizer.prob: 1.0
|
654 |
+
|
655 |
+
val/FrequencyMask.f_center: !!python/tuple
|
656 |
+
- uniform
|
657 |
+
- 0.0
|
658 |
+
- 1.0
|
659 |
+
val/FrequencyMask.f_width: !!python/tuple
|
660 |
+
- const
|
661 |
+
- 0.1
|
662 |
+
val/FrequencyMask.name: null
|
663 |
+
val/FrequencyMask.prob: 1
|
664 |
+
|
665 |
+
val/FrequencyNoise.f_center: !!python/tuple
|
666 |
+
- uniform
|
667 |
+
- 0.0
|
668 |
+
- 1.0
|
669 |
+
val/FrequencyNoise.f_width: !!python/tuple
|
670 |
+
- const
|
671 |
+
- 0.1
|
672 |
+
val/FrequencyNoise.name: null
|
673 |
+
val/FrequencyNoise.prob: 1
|
674 |
+
|
675 |
+
val/GlobalVolumeNorm.db: !!python/tuple
|
676 |
+
- const
|
677 |
+
- -24
|
678 |
+
val/GlobalVolumeNorm.name: null
|
679 |
+
val/GlobalVolumeNorm.prob: 1.0
|
680 |
+
|
681 |
+
val/HighPass.cutoff: !!python/tuple
|
682 |
+
- choice
|
683 |
+
- - 50
|
684 |
+
- 100
|
685 |
+
- 250
|
686 |
+
- 500
|
687 |
+
- 1000
|
688 |
+
val/HighPass.name: null
|
689 |
+
val/HighPass.prob: 1
|
690 |
+
val/HighPass.zeros: 51
|
691 |
+
|
692 |
+
val/InvertPhase.name: null
|
693 |
+
val/InvertPhase.prob: 1
|
694 |
+
|
695 |
+
val/LowPass.cutoff: !!python/tuple
|
696 |
+
- choice
|
697 |
+
- - 4000
|
698 |
+
- 8000
|
699 |
+
- 16000
|
700 |
+
val/LowPass.name: null
|
701 |
+
val/LowPass.prob: 1
|
702 |
+
val/LowPass.zeros: 51
|
703 |
+
|
704 |
+
val/MaskLowMagnitudes.db_cutoff: !!python/tuple
|
705 |
+
- uniform
|
706 |
+
- -10
|
707 |
+
- 10
|
708 |
+
val/MaskLowMagnitudes.name: null
|
709 |
+
val/MaskLowMagnitudes.prob: 1
|
710 |
+
|
711 |
+
val/MuLawQuantization.channels: !!python/tuple
|
712 |
+
- choice
|
713 |
+
- - 8
|
714 |
+
- 32
|
715 |
+
- 128
|
716 |
+
- 256
|
717 |
+
- 1024
|
718 |
+
val/MuLawQuantization.name: null
|
719 |
+
val/MuLawQuantization.prob: 1.0
|
720 |
+
|
721 |
+
val/NoiseFloor.db: !!python/tuple
|
722 |
+
- const
|
723 |
+
- -50.0
|
724 |
+
val/NoiseFloor.name: null
|
725 |
+
val/NoiseFloor.prob: 1.0
|
726 |
+
|
727 |
+
val/Quantization.channels: !!python/tuple
|
728 |
+
- choice
|
729 |
+
- - 8
|
730 |
+
- 32
|
731 |
+
- 128
|
732 |
+
- 256
|
733 |
+
- 1024
|
734 |
+
val/Quantization.name: null
|
735 |
+
val/Quantization.prob: 1.0
|
736 |
+
|
737 |
+
val/Repeat.n_repeat: 1
|
738 |
+
val/Repeat.name: null
|
739 |
+
val/Repeat.prob: 1.0
|
740 |
+
|
741 |
+
val/RepeatUpTo.max_repeat: 5
|
742 |
+
val/RepeatUpTo.name: null
|
743 |
+
val/RepeatUpTo.prob: 1.0
|
744 |
+
val/RepeatUpTo.weights: null
|
745 |
+
|
746 |
+
val/RescaleAudio.name: null
|
747 |
+
val/RescaleAudio.prob: 1
|
748 |
+
val/RescaleAudio.val: 1.0
|
749 |
+
|
750 |
+
val/RoomImpulseResponse.drr: !!python/tuple
|
751 |
+
- uniform
|
752 |
+
- 0.0
|
753 |
+
- 30.0
|
754 |
+
val/RoomImpulseResponse.duration: 1.0
|
755 |
+
val/RoomImpulseResponse.eq_amount: !!python/tuple
|
756 |
+
- const
|
757 |
+
- 1.0
|
758 |
+
val/RoomImpulseResponse.n_bands: 6
|
759 |
+
val/RoomImpulseResponse.name: null
|
760 |
+
val/RoomImpulseResponse.offset: 0.0
|
761 |
+
val/RoomImpulseResponse.prob: 1.0
|
762 |
+
val/RoomImpulseResponse.sources: null
|
763 |
+
val/RoomImpulseResponse.use_original_phase: false
|
764 |
+
val/RoomImpulseResponse.weights: null
|
765 |
+
|
766 |
+
val/ShiftPhase.name: null
|
767 |
+
val/ShiftPhase.prob: 1
|
768 |
+
val/ShiftPhase.shift: !!python/tuple
|
769 |
+
- uniform
|
770 |
+
- -3.141592653589793
|
771 |
+
- 3.141592653589793
|
772 |
+
|
773 |
+
val/Silence.name: null
|
774 |
+
val/Silence.prob: 0.1
|
775 |
+
|
776 |
+
val/Smoothing.name: null
|
777 |
+
val/Smoothing.prob: 1
|
778 |
+
val/Smoothing.window_length: !!python/tuple
|
779 |
+
- choice
|
780 |
+
- - 8
|
781 |
+
- 16
|
782 |
+
- 32
|
783 |
+
- 64
|
784 |
+
- 128
|
785 |
+
- 256
|
786 |
+
- 512
|
787 |
+
val/Smoothing.window_type: !!python/tuple
|
788 |
+
- const
|
789 |
+
- average
|
790 |
+
|
791 |
+
val/SpectralDenoising.denoise_amount: !!python/tuple
|
792 |
+
- uniform
|
793 |
+
- 0.8
|
794 |
+
- 1.0
|
795 |
+
val/SpectralDenoising.eq_amount: !!python/tuple
|
796 |
+
- const
|
797 |
+
- 1.0
|
798 |
+
val/SpectralDenoising.n_bands: 6
|
799 |
+
val/SpectralDenoising.n_freq: 3
|
800 |
+
val/SpectralDenoising.n_time: 5
|
801 |
+
val/SpectralDenoising.name: null
|
802 |
+
val/SpectralDenoising.nz_volume: -40
|
803 |
+
val/SpectralDenoising.prob: 1
|
804 |
+
|
805 |
+
val/TimeMask.name: null
|
806 |
+
val/TimeMask.prob: 1
|
807 |
+
val/TimeMask.t_center: !!python/tuple
|
808 |
+
- uniform
|
809 |
+
- 0.0
|
810 |
+
- 1.0
|
811 |
+
val/TimeMask.t_width: !!python/tuple
|
812 |
+
- const
|
813 |
+
- 0.025
|
814 |
+
|
815 |
+
val/TimeNoise.name: null
|
816 |
+
val/TimeNoise.prob: 1
|
817 |
+
val/TimeNoise.t_center: !!python/tuple
|
818 |
+
- uniform
|
819 |
+
- 0.0
|
820 |
+
- 1.0
|
821 |
+
val/TimeNoise.t_width: !!python/tuple
|
822 |
+
- const
|
823 |
+
- 0.025
|
824 |
+
|
825 |
+
val/VolumeChange.db: !!python/tuple
|
826 |
+
- uniform
|
827 |
+
- -12.0
|
828 |
+
- 0.0
|
829 |
+
val/VolumeChange.name: null
|
830 |
+
val/VolumeChange.prob: 1.0
|
831 |
+
|
832 |
+
val/VolumeNorm.db: !!python/tuple
|
833 |
+
- const
|
834 |
+
- -24
|
835 |
+
val/VolumeNorm.name: null
|
836 |
+
val/VolumeNorm.prob: 1.0
|
837 |
+
|
838 |
+
val_freq: 1000
|
839 |
+
|
840 |
+
val_idx:
|
841 |
+
- 0
|
842 |
+
- 1
|
843 |
+
- 2
|
844 |
+
- 3
|
845 |
+
- 4
|
846 |
+
- 5
|
847 |
+
- 6
|
848 |
+
- 7
|
849 |
+
- 8
|
850 |
+
- 9
|
851 |
+
|
runs/soundrangers-v2-v1/coarse/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3809d9bbaa27f5ad1d409945180e11f5420c3c765e09d185fa1dbdd2ee77c59f
|
3 |
+
size 1343718241
|
runs/soundrangers-v2-v1/coarse/model.txt
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
VampNet(
|
2 |
+
335.894M params.
|
3 |
+
(embedding): CodebookEmbedding(
|
4 |
+
0.042M params.
|
5 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
|
6 |
+
(out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
|
7 |
+
)
|
8 |
+
(transformer): TransformerStack(
|
9 |
+
330.600M params.
|
10 |
+
(layers): ModuleList(
|
11 |
+
(0): TransformerLayer(
|
12 |
+
16.531M params.
|
13 |
+
(norm_1): RMSNorm( 0.001M params.)
|
14 |
+
(film_1): FiLM( 0.000M params.)
|
15 |
+
(self_attn): MultiHeadRelativeAttention(
|
16 |
+
6.616M params.
|
17 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
18 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
19 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
21 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
22 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
23 |
+
)
|
24 |
+
(norm_3): RMSNorm( 0.001M params.)
|
25 |
+
(film_3): FiLM( 0.000M params.)
|
26 |
+
(feed_forward): FeedForward(
|
27 |
+
9.912M params.
|
28 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
29 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
30 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
31 |
+
(act): GatedGELU(
|
32 |
+
0.000M params.
|
33 |
+
(gelu): NewGELU( 0.000M params.)
|
34 |
+
)
|
35 |
+
)
|
36 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
37 |
+
)
|
38 |
+
(1-19): 19 x TransformerLayer(
|
39 |
+
16.530M params.
|
40 |
+
(norm_1): RMSNorm( 0.001M params.)
|
41 |
+
(film_1): FiLM( 0.000M params.)
|
42 |
+
(self_attn): MultiHeadRelativeAttention(
|
43 |
+
6.615M params.
|
44 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
45 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
46 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
48 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
49 |
+
)
|
50 |
+
(norm_3): RMSNorm( 0.001M params.)
|
51 |
+
(film_3): FiLM( 0.000M params.)
|
52 |
+
(feed_forward): FeedForward(
|
53 |
+
9.912M params.
|
54 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
55 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
56 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
57 |
+
(act): GatedGELU(
|
58 |
+
0.000M params.
|
59 |
+
(gelu): NewGELU( 0.000M params.)
|
60 |
+
)
|
61 |
+
)
|
62 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
63 |
+
)
|
64 |
+
)
|
65 |
+
(norm): RMSNorm( 0.001M params.)
|
66 |
+
)
|
67 |
+
(classifier): SequentialWithFiLM(
|
68 |
+
5.251M params.
|
69 |
+
(layers): ModuleList(
|
70 |
+
(0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
|
71 |
+
)
|
72 |
+
)
|
73 |
+
)
|
runs/soundrangers-v2/c2f/args.yml
ADDED
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 3.0
|
14 |
+
AudioDataset.loudness_cutoff: -40.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: ''
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
CrossEntropyLoss.ignore_index: -100
|
33 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
34 |
+
CrossEntropyLoss.reduce: null
|
35 |
+
CrossEntropyLoss.reduction: mean
|
36 |
+
CrossEntropyLoss.size_average: null
|
37 |
+
|
38 |
+
NoamScheduler.d_model: 512
|
39 |
+
NoamScheduler.factor: 2.0
|
40 |
+
NoamScheduler.warmup: 500
|
41 |
+
|
42 |
+
VampNet.dropout: 0.1
|
43 |
+
VampNet.embedding_dim: 1280
|
44 |
+
VampNet.flash_attn: false
|
45 |
+
VampNet.latent_dim: 8
|
46 |
+
VampNet.n_codebooks: 14
|
47 |
+
VampNet.n_conditioning_codebooks: 4
|
48 |
+
VampNet.n_heads: 20
|
49 |
+
VampNet.n_layers: 16
|
50 |
+
VampNet.noise_mode: mask
|
51 |
+
VampNet.r_cond_dim: 0
|
52 |
+
VampNet.vocab_size: 1024
|
53 |
+
|
54 |
+
amp: false
|
55 |
+
|
56 |
+
args.debug: true
|
57 |
+
args.load: conf/generated/natural-sounds/c2f.yml
|
58 |
+
args.save: null
|
59 |
+
|
60 |
+
batch_size: 6
|
61 |
+
|
62 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
63 |
+
|
64 |
+
fine_tune: true
|
65 |
+
|
66 |
+
fine_tune_checkpoint: ./models/vampnet/c2f.pth
|
67 |
+
|
68 |
+
grad_clip_val: 5.0
|
69 |
+
|
70 |
+
num_iters: 500000
|
71 |
+
|
72 |
+
num_workers: 7
|
73 |
+
|
74 |
+
resume: false
|
75 |
+
|
76 |
+
sample_freq: 2000
|
77 |
+
|
78 |
+
save_iters:
|
79 |
+
- 2000
|
80 |
+
- 4000
|
81 |
+
- 10000
|
82 |
+
- 20000
|
83 |
+
- 40000
|
84 |
+
- 100000
|
85 |
+
|
86 |
+
save_path: ./runs/soundrangers-v2/c2f
|
87 |
+
|
88 |
+
seed: 0
|
89 |
+
|
90 |
+
tag: latest
|
91 |
+
|
92 |
+
train/AudioDataset.aligned: false
|
93 |
+
train/AudioDataset.duration: 3.0
|
94 |
+
train/AudioDataset.loudness_cutoff: -40.0
|
95 |
+
train/AudioDataset.n_examples: 100000000
|
96 |
+
train/AudioDataset.num_channels: 1
|
97 |
+
train/AudioDataset.offset: null
|
98 |
+
train/AudioDataset.shuffle_loaders: false
|
99 |
+
train/AudioDataset.without_replacement: false
|
100 |
+
|
101 |
+
train/AudioLoader.sources:
|
102 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Complete
|
103 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Update 2018
|
104 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
|
105 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
|
106 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
|
107 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
|
108 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
|
109 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
|
110 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
|
111 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
|
112 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
|
113 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Bells
|
114 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
|
115 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
|
116 |
+
|
117 |
+
val/AudioDataset.aligned: false
|
118 |
+
val/AudioDataset.duration: 3.0
|
119 |
+
val/AudioDataset.loudness_cutoff: -40.0
|
120 |
+
val/AudioDataset.n_examples: 500
|
121 |
+
val/AudioDataset.num_channels: 1
|
122 |
+
val/AudioDataset.offset: null
|
123 |
+
val/AudioDataset.shuffle_loaders: false
|
124 |
+
val/AudioDataset.without_replacement: false
|
125 |
+
|
126 |
+
val/AudioLoader.sources:
|
127 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Complete
|
128 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Update 2018
|
129 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
|
130 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
|
131 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
|
132 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
|
133 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
|
134 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
|
135 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
|
136 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
|
137 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
|
138 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Bells
|
139 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
|
140 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
|
141 |
+
|
142 |
+
val_freq: 1000
|
143 |
+
|
144 |
+
val_idx:
|
145 |
+
- 0
|
146 |
+
- 1
|
147 |
+
- 2
|
148 |
+
- 3
|
149 |
+
- 4
|
150 |
+
- 5
|
151 |
+
- 6
|
152 |
+
- 7
|
153 |
+
- 8
|
154 |
+
- 9
|
155 |
+
|
runs/soundrangers-v2/c2f/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f483e7eaa0ea690c30a805936226833ccd2066db4b4309d2edcb542545bd1d62
|
3 |
+
size 1111127537
|
runs/soundrangers-v2/c2f/model.txt
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
OptimizedModule(
|
2 |
+
277.753M params.
|
3 |
+
(_orig_mod): VampNet(
|
4 |
+
277.753M params.
|
5 |
+
(embedding): CodebookEmbedding(
|
6 |
+
0.145M params.
|
7 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 14x8 (GPU 0)] 0.000M params.)
|
8 |
+
(out_proj): Conv1d(112, 1280, kernel_size=(1,), stride=(1,) 0.145M params.)
|
9 |
+
)
|
10 |
+
(transformer): TransformerStack(
|
11 |
+
264.481M params.
|
12 |
+
(layers): ModuleList(
|
13 |
+
(0): TransformerLayer(
|
14 |
+
16.531M params.
|
15 |
+
(norm_1): RMSNorm( 0.001M params.)
|
16 |
+
(film_1): FiLM( 0.000M params.)
|
17 |
+
(self_attn): MultiHeadRelativeAttention(
|
18 |
+
6.616M params.
|
19 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
21 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
22 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
23 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
24 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
25 |
+
)
|
26 |
+
(norm_3): RMSNorm( 0.001M params.)
|
27 |
+
(film_3): FiLM( 0.000M params.)
|
28 |
+
(feed_forward): FeedForward(
|
29 |
+
9.912M params.
|
30 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
31 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
32 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
33 |
+
(act): GatedGELU(
|
34 |
+
0.000M params.
|
35 |
+
(gelu): NewGELU( 0.000M params.)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
39 |
+
)
|
40 |
+
(1-15): 15 x TransformerLayer(
|
41 |
+
16.530M params.
|
42 |
+
(norm_1): RMSNorm( 0.001M params.)
|
43 |
+
(film_1): FiLM( 0.000M params.)
|
44 |
+
(self_attn): MultiHeadRelativeAttention(
|
45 |
+
6.615M params.
|
46 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
48 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
49 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
50 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
51 |
+
)
|
52 |
+
(norm_3): RMSNorm( 0.001M params.)
|
53 |
+
(film_3): FiLM( 0.000M params.)
|
54 |
+
(feed_forward): FeedForward(
|
55 |
+
9.912M params.
|
56 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
57 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
58 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
59 |
+
(act): GatedGELU(
|
60 |
+
0.000M params.
|
61 |
+
(gelu): NewGELU( 0.000M params.)
|
62 |
+
)
|
63 |
+
)
|
64 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
65 |
+
)
|
66 |
+
)
|
67 |
+
(norm): RMSNorm( 0.001M params.)
|
68 |
+
)
|
69 |
+
(classifier): SequentialWithFiLM(
|
70 |
+
13.128M params.
|
71 |
+
(layers): ModuleList(
|
72 |
+
(0): Conv1d(1280, 10240, kernel_size=(1,), stride=(1,), padding=same 13.128M params.)
|
73 |
+
)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
)
|
runs/soundrangers-v2/coarse/args.yml
ADDED
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AdamW.amsgrad: false
|
2 |
+
AdamW.betas: !!python/tuple
|
3 |
+
- 0.9
|
4 |
+
- 0.999
|
5 |
+
AdamW.capturable: false
|
6 |
+
AdamW.differentiable: false
|
7 |
+
AdamW.eps: 1.0e-08
|
8 |
+
AdamW.lr: 0.0001
|
9 |
+
AdamW.maximize: false
|
10 |
+
AdamW.weight_decay: 0.01
|
11 |
+
|
12 |
+
AudioDataset.aligned: false
|
13 |
+
AudioDataset.duration: 10.0
|
14 |
+
AudioDataset.loudness_cutoff: -30.0
|
15 |
+
AudioDataset.n_examples: 1000
|
16 |
+
AudioDataset.num_channels: 1
|
17 |
+
AudioDataset.offset: null
|
18 |
+
AudioDataset.shuffle_loaders: false
|
19 |
+
AudioDataset.without_replacement: false
|
20 |
+
|
21 |
+
AudioLoader.ext:
|
22 |
+
- .wav
|
23 |
+
- .flac
|
24 |
+
- .mp3
|
25 |
+
- .mp4
|
26 |
+
AudioLoader.relative_path: ''
|
27 |
+
AudioLoader.shuffle: true
|
28 |
+
AudioLoader.shuffle_state: 0
|
29 |
+
AudioLoader.sources: null
|
30 |
+
AudioLoader.weights: null
|
31 |
+
|
32 |
+
CrossEntropyLoss.ignore_index: -100
|
33 |
+
CrossEntropyLoss.label_smoothing: 0.1
|
34 |
+
CrossEntropyLoss.reduce: null
|
35 |
+
CrossEntropyLoss.reduction: mean
|
36 |
+
CrossEntropyLoss.size_average: null
|
37 |
+
|
38 |
+
NoamScheduler.d_model: 512
|
39 |
+
NoamScheduler.factor: 2.0
|
40 |
+
NoamScheduler.warmup: 500
|
41 |
+
|
42 |
+
VampNet.dropout: 0.1
|
43 |
+
VampNet.embedding_dim: 1280
|
44 |
+
VampNet.flash_attn: false
|
45 |
+
VampNet.latent_dim: 8
|
46 |
+
VampNet.n_codebooks: 4
|
47 |
+
VampNet.n_conditioning_codebooks: 0
|
48 |
+
VampNet.n_heads: 20
|
49 |
+
VampNet.n_layers: 20
|
50 |
+
VampNet.noise_mode: mask
|
51 |
+
VampNet.r_cond_dim: 0
|
52 |
+
VampNet.vocab_size: 1024
|
53 |
+
|
54 |
+
amp: false
|
55 |
+
|
56 |
+
args.debug: true
|
57 |
+
args.load: conf/generated/natural-sounds/coarse.yml
|
58 |
+
args.save: null
|
59 |
+
|
60 |
+
batch_size: 6
|
61 |
+
|
62 |
+
codec_ckpt: ./models/vampnet/codec.pth
|
63 |
+
|
64 |
+
fine_tune: true
|
65 |
+
|
66 |
+
fine_tune_checkpoint: ./models/vampnet/coarse.pth
|
67 |
+
|
68 |
+
grad_clip_val: 5.0
|
69 |
+
|
70 |
+
num_iters: 500000
|
71 |
+
|
72 |
+
num_workers: 7
|
73 |
+
|
74 |
+
resume: false
|
75 |
+
|
76 |
+
sample_freq: 2000
|
77 |
+
|
78 |
+
save_iters:
|
79 |
+
- 2000
|
80 |
+
- 4000
|
81 |
+
- 10000
|
82 |
+
- 20000
|
83 |
+
- 40000
|
84 |
+
- 100000
|
85 |
+
|
86 |
+
save_path: ./runs/soundrangers-v2/coarse
|
87 |
+
|
88 |
+
seed: 0
|
89 |
+
|
90 |
+
tag: latest
|
91 |
+
|
92 |
+
train/AudioDataset.aligned: false
|
93 |
+
train/AudioDataset.duration: 10.0
|
94 |
+
train/AudioDataset.loudness_cutoff: -30.0
|
95 |
+
train/AudioDataset.n_examples: 100000000
|
96 |
+
train/AudioDataset.num_channels: 1
|
97 |
+
train/AudioDataset.offset: null
|
98 |
+
train/AudioDataset.shuffle_loaders: false
|
99 |
+
train/AudioDataset.without_replacement: false
|
100 |
+
|
101 |
+
train/AudioLoader.sources:
|
102 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Complete
|
103 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Update 2018
|
104 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
|
105 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
|
106 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
|
107 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
|
108 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
|
109 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
|
110 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
|
111 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
|
112 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
|
113 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Bells
|
114 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
|
115 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
|
116 |
+
|
117 |
+
val/AudioDataset.aligned: false
|
118 |
+
val/AudioDataset.duration: 10.0
|
119 |
+
val/AudioDataset.loudness_cutoff: -30.0
|
120 |
+
val/AudioDataset.n_examples: 500
|
121 |
+
val/AudioDataset.num_channels: 1
|
122 |
+
val/AudioDataset.offset: null
|
123 |
+
val/AudioDataset.shuffle_loaders: false
|
124 |
+
val/AudioDataset.without_replacement: false
|
125 |
+
|
126 |
+
val/AudioLoader.sources:
|
127 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Complete
|
128 |
+
- /media/CHONK2/prosound_redacted/Soundrangers Update 2018
|
129 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Animals
|
130 |
+
- /media/CHONK2/prosound_redacted/BBC Nature Sound Effects Library/Birds
|
131 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Foley
|
132 |
+
- /media/CHONK2/prosound_redacted/BBC Historical and 1-166 Sound Effects Library/Musical
|
133 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Dogs
|
134 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Farm
|
135 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Horses
|
136 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Rodents
|
137 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Mammals - Wild
|
138 |
+
- /media/CHONK2/prosound_redacted/Big Room Complete/Bells
|
139 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Chimes
|
140 |
+
- /media/CHONK2/prosound_redacted/King Collection - Volume 1/Musical - Instruments
|
141 |
+
|
142 |
+
val_freq: 1000
|
143 |
+
|
144 |
+
val_idx:
|
145 |
+
- 0
|
146 |
+
- 1
|
147 |
+
- 2
|
148 |
+
- 3
|
149 |
+
- 4
|
150 |
+
- 5
|
151 |
+
- 6
|
152 |
+
- 7
|
153 |
+
- 8
|
154 |
+
- 9
|
155 |
+
|
runs/soundrangers-v2/coarse/latest/vampnet/weights.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:961d04558e809c3828b92526e9141be051bb9195144a7d598341d60eef5db90f
|
3 |
+
size 1343718241
|
runs/soundrangers-v2/coarse/model.txt
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
OptimizedModule(
|
2 |
+
335.894M params.
|
3 |
+
(_orig_mod): VampNet(
|
4 |
+
335.894M params.
|
5 |
+
(embedding): CodebookEmbedding(
|
6 |
+
0.042M params.
|
7 |
+
(special): ParameterDict( (MASK): Parameter containing: [torch.cuda.FloatTensor of size 4x8 (GPU 0)] 0.000M params.)
|
8 |
+
(out_proj): Conv1d(32, 1280, kernel_size=(1,), stride=(1,) 0.042M params.)
|
9 |
+
)
|
10 |
+
(transformer): TransformerStack(
|
11 |
+
330.600M params.
|
12 |
+
(layers): ModuleList(
|
13 |
+
(0): TransformerLayer(
|
14 |
+
16.531M params.
|
15 |
+
(norm_1): RMSNorm( 0.001M params.)
|
16 |
+
(film_1): FiLM( 0.000M params.)
|
17 |
+
(self_attn): MultiHeadRelativeAttention(
|
18 |
+
6.616M params.
|
19 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
20 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
21 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
22 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
23 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
24 |
+
(relative_attention_bias): Embedding(32, 20 0.001M params.)
|
25 |
+
)
|
26 |
+
(norm_3): RMSNorm( 0.001M params.)
|
27 |
+
(film_3): FiLM( 0.000M params.)
|
28 |
+
(feed_forward): FeedForward(
|
29 |
+
9.912M params.
|
30 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
31 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
32 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
33 |
+
(act): GatedGELU(
|
34 |
+
0.000M params.
|
35 |
+
(gelu): NewGELU( 0.000M params.)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
39 |
+
)
|
40 |
+
(1-19): 19 x TransformerLayer(
|
41 |
+
16.530M params.
|
42 |
+
(norm_1): RMSNorm( 0.001M params.)
|
43 |
+
(film_1): FiLM( 0.000M params.)
|
44 |
+
(self_attn): MultiHeadRelativeAttention(
|
45 |
+
6.615M params.
|
46 |
+
(w_qs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
47 |
+
(w_ks): Linear(in_features=1280, out_features=1280, bias=False 1.638M params.)
|
48 |
+
(w_vs): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
49 |
+
(fc): Linear(in_features=1280, out_features=1280, bias=False 1.659M params.)
|
50 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
51 |
+
)
|
52 |
+
(norm_3): RMSNorm( 0.001M params.)
|
53 |
+
(film_3): FiLM( 0.000M params.)
|
54 |
+
(feed_forward): FeedForward(
|
55 |
+
9.912M params.
|
56 |
+
(w_1): Linear(in_features=1280, out_features=5120, bias=False 6.605M params.)
|
57 |
+
(w_2): Linear(in_features=2560, out_features=1280, bias=False 3.308M params.)
|
58 |
+
(drop): Dropout(p=0.1, inplace=False 0.000M params.)
|
59 |
+
(act): GatedGELU(
|
60 |
+
0.000M params.
|
61 |
+
(gelu): NewGELU( 0.000M params.)
|
62 |
+
)
|
63 |
+
)
|
64 |
+
(dropout): Dropout(p=0.1, inplace=False 0.000M params.)
|
65 |
+
)
|
66 |
+
)
|
67 |
+
(norm): RMSNorm( 0.001M params.)
|
68 |
+
)
|
69 |
+
(classifier): SequentialWithFiLM(
|
70 |
+
5.251M params.
|
71 |
+
(layers): ModuleList(
|
72 |
+
(0): Conv1d(1280, 4096, kernel_size=(1,), stride=(1,), padding=same 5.251M params.)
|
73 |
+
)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
)
|