File size: 13,044 Bytes
6ee97f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
01-18 13:05:10 INFO     [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR.
01-18 13:05:10 INFO     [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR.
01-18 13:05:10 INFO     [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR.
01-18 13:05:10 INFO     [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR.
01-18 13:07:34 INFO     [logging.py:61]: Configuration file is saved to /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR/config__2024_01_18--13_07_07.toml.
01-18 13:07:34 INFO     [logging.py:61]: Environment information:
- `Accelerate` version: 0.26.1
- Platform: Linux-5.14.0-362.13.1.el9_3.x86_64-x86_64-with-glibc2.34
- Python version: 3.10.13
- Numpy version: 1.26.3
- PyTorch version (GPU?): 2.1.2 (True)
- System RAM: 503.48 GB
- GPU Available: True
- GPU IDs: 4
- GPU type: NVIDIA A100-SXM4-80GB
01-18 13:07:34 INFO     [logging.py:61]: 
 ===============================================================================================
Layer (type:depth-idx)                                                 Param #
===============================================================================================
DistributedDataParallel                                                --
├─Model: 1-1                                                           --
│    └─EncodecModel: 2-1                                               --
│    │    └─EncodecEncoder: 3-1                                        (7,425,792)
│    │    └─EncodecDecoder: 3-2                                        (7,426,018)
│    │    └─EncodecResidualVectorQuantizer: 3-3                        --
│    └─TokenEmbedding: 2-2                                             --
│    │    └─Dropout: 3-4                                               --
│    │    └─Embedding: 3-5                                             524,800
│    └─Identity: 2-3                                                   --
│    └─SinePositionalEmbedding: 2-4                                    1
│    │    └─Dropout: 3-6                                               --
│    └─TransformerEncoder: 2-5                                         --
│    │    └─ModuleList: 3-7                                            37,828,608
│    │    └─LayerNorm: 3-8                                             1,024
│    └─Linear: 2-6                                                     524,800
│    └─MulticlassAccuracy: 2-7                                         --
│    └─TokenEmbedding: 2-8                                             --
│    │    └─Dropout: 3-9                                               --
│    │    └─Embedding: 3-10                                            524,288
│    └─ModuleList: 2-9                                                 --
│    │    └─TokenEmbedding: 3-11                                       524,800
│    │    └─TokenEmbedding: 3-12                                       524,288
│    │    └─TokenEmbedding: 3-13                                       524,288
│    │    └─TokenEmbedding: 3-14                                       524,288
│    │    └─TokenEmbedding: 3-15                                       524,288
│    │    └─TokenEmbedding: 3-16                                       524,288
│    │    └─TokenEmbedding: 3-17                                       524,288
│    │    └─TokenEmbedding: 3-18                                       524,288
│    └─Identity: 2-10                                                  --
│    └─SinePositionalEmbedding: 2-11                                   1
│    │    └─Dropout: 3-19                                              --
│    └─TransformerEncoder: 2-12                                        --
│    │    └─ModuleList: 3-20                                           50,436,096
│    │    └─AdaptiveLayerNorm: 3-21                                    526,336
│    └─ModuleList: 2-13                                                --
│    │    └─Linear: 3-22                                               524,288
│    │    └─Linear: 3-23                                               524,288
│    │    └─Linear: 3-24                                               524,288
│    │    └─Linear: 3-25                                               524,288
│    │    └─Linear: 3-26                                               524,288
│    │    └─Linear: 3-27                                               524,288
│    │    └─Linear: 3-28                                               524,288
│    └─ModuleList: 2-14                                                --
│    │    └─TokenEmbedding: 3-29                                       512
│    │    └─TokenEmbedding: 3-30                                       512
│    │    └─TokenEmbedding: 3-31                                       512
│    │    └─TokenEmbedding: 3-32                                       512
│    │    └─TokenEmbedding: 3-33                                       512
│    │    └─TokenEmbedding: 3-34                                       512
│    │    └─TokenEmbedding: 3-35                                       512
│    └─MulticlassAccuracy: 2-15                                        --
===============================================================================================
Total params: 113,086,180
Trainable params: 98,234,369
Non-trainable params: 14,851,811
===============================================================================================
01-18 13:07:34 INFO     [logging.py:61]: Training control variables:
01-18 13:07:34 INFO     [logging.py:61]: `steps_per_epoch`: 500
01-18 13:07:34 INFO     [logging.py:61]: Gradient accumulation steps: 1
01-18 13:07:34 INFO     [logging.py:61]: `update_steps_per_epoch`: 500
01-18 13:07:34 INFO     [logging.py:61]: `max_steps`: 500000
01-18 13:07:34 INFO     [logging.py:61]: `max_epochs`: 1000
01-18 13:07:34 INFO     [logging.py:61]: warmup_steps=1000. warmup_ratio will be ignored.
01-18 13:07:34 INFO     [logging.py:61]: ========= Epoch 1 out of 1000 =========
01-18 13:07:34 INFO     [logging.py:61]: Begin training...
01-18 13:17:30 INFO     [logging.py:61]: Loss 'loss' on epoch 1: 4.6260857582092285
01-18 13:17:31 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 1: 4.6260857582092285
01-18 13:17:31 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 1: 0.0
01-18 13:17:31 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 1: 0.4006486237049103
01-18 13:17:31 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 1: 0.0
01-18 13:17:31 INFO     [logging.py:61]: ========= Epoch 2 out of 1000 =========
01-18 13:17:31 INFO     [logging.py:61]: Begin training...
01-18 13:27:26 INFO     [logging.py:61]: Loss 'loss' on epoch 2: 3.4351987838745117
01-18 13:27:26 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 2: 3.4351987838745117
01-18 13:27:26 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 2: 0.0
01-18 13:27:26 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 2: 0.5857024192810059
01-18 13:27:26 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 2: 0.0
01-18 13:27:26 INFO     [logging.py:61]: ========= Epoch 3 out of 1000 =========
01-18 13:27:26 INFO     [logging.py:61]: Begin training...
01-18 13:37:46 INFO     [logging.py:61]: Loss 'loss' on epoch 3: 3.175524950027466
01-18 13:37:46 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 3: 3.175524950027466
01-18 13:37:46 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 3: 0.0
01-18 13:37:46 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 3: 0.6268473267555237
01-18 13:37:46 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 3: 0.0
01-18 13:37:46 INFO     [logging.py:61]: ========= Epoch 4 out of 1000 =========
01-18 13:37:46 INFO     [logging.py:61]: Begin training...
01-18 13:47:20 INFO     [logging.py:61]: Loss 'loss' on epoch 4: 3.0606117248535156
01-18 13:47:20 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 4: 3.0606117248535156
01-18 13:47:20 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 4: 0.0
01-18 13:47:20 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 4: 0.6437094211578369
01-18 13:47:20 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 4: 0.0
01-18 13:47:20 INFO     [logging.py:61]: ========= Epoch 5 out of 1000 =========
01-18 13:47:20 INFO     [logging.py:61]: Begin training...
01-18 13:56:19 INFO     [logging.py:61]: Loss 'loss' on epoch 5: 2.9828383922576904
01-18 13:56:19 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 5: 2.9828383922576904
01-18 13:56:19 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 5: 0.0
01-18 13:56:19 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 5: 0.6574126482009888
01-18 13:56:19 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 5: 0.0
01-18 13:56:19 INFO     [logging.py:61]: Saving current state to /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR/checkpoints/epoch_0005
01-18 13:56:21 INFO     [logging.py:61]: Model weights saved in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR/checkpoints/epoch_0005/pytorch_model.bin
01-18 13:56:23 INFO     [logging.py:61]: Optimizer state saved in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR/checkpoints/epoch_0005/optimizer.bin
01-18 13:56:23 INFO     [logging.py:61]: Scheduler state saved in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR/checkpoints/epoch_0005/scheduler.bin
01-18 13:56:23 INFO     [logging.py:61]: Sampler state for dataloader 0 saved in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR/checkpoints/epoch_0005/sampler.bin
01-18 13:56:23 INFO     [logging.py:61]: Sampler state for dataloader 1 saved in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR/checkpoints/epoch_0005/sampler_1.bin
01-18 13:56:23 INFO     [logging.py:61]: Random states saved in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR/checkpoints/epoch_0005/random_states_0.pkl
01-18 13:56:23 INFO     [logging.py:61]: Saving the state of TrainerState to /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-4_AR-NAR/checkpoints/epoch_0005/custom_checkpoint_0.pkl
01-18 13:56:23 INFO     [logging.py:61]: ========= Epoch 6 out of 1000 =========
01-18 13:56:23 INFO     [logging.py:61]: Begin training...
01-18 14:06:06 INFO     [logging.py:61]: Loss 'loss' on epoch 6: 2.9489591121673584
01-18 14:06:06 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 6: 2.9489591121673584
01-18 14:06:06 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 6: 0.0
01-18 14:06:06 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 6: 0.6638808846473694
01-18 14:06:06 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 6: 0.0
01-18 14:06:06 INFO     [logging.py:61]: ========= Epoch 7 out of 1000 =========
01-18 14:06:06 INFO     [logging.py:61]: Begin training...
01-18 14:15:31 INFO     [logging.py:61]: Loss 'loss' on epoch 7: 2.9095468521118164
01-18 14:15:31 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 7: 2.9095468521118164
01-18 14:15:31 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 7: 0.0
01-18 14:15:31 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 7: 0.670335590839386
01-18 14:15:31 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 7: 0.0
01-18 14:15:31 INFO     [logging.py:61]: ========= Epoch 8 out of 1000 =========
01-18 14:15:31 INFO     [logging.py:61]: Begin training...
01-18 14:25:02 INFO     [logging.py:61]: Loss 'loss' on epoch 8: 2.8823063373565674
01-18 14:25:02 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 8: 2.8823063373565674
01-18 14:25:02 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 8: 0.0
01-18 14:25:02 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 8: 0.6748366355895996
01-18 14:25:02 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 8: 0.0
01-18 14:25:02 INFO     [logging.py:61]: ========= Epoch 9 out of 1000 =========
01-18 14:25:02 INFO     [logging.py:61]: Begin training...