File size: 10,039 Bytes
6ee97f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
01-18 13:14:39 INFO     [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR.
01-18 13:14:39 INFO     [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR.
01-18 13:14:39 INFO     [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR.
01-18 13:14:45 INFO     [logging.py:61]: Configuration file is saved to /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR/config__2024_01_18--13_14_43.toml.
01-18 13:14:45 INFO     [logging.py:61]: Environment information:
- `Accelerate` version: 0.26.1
- Platform: Linux-5.14.0-362.13.1.el9_3.x86_64-x86_64-with-glibc2.34
- Python version: 3.10.13
- Numpy version: 1.26.3
- PyTorch version (GPU?): 2.1.2 (True)
- System RAM: 503.48 GB
- GPU Available: True
- GPU IDs: 4
- GPU type: NVIDIA A100-SXM4-80GB
01-18 13:14:45 INFO     [logging.py:61]: 
 ===============================================================================================
Layer (type:depth-idx)                                                 Param #
===============================================================================================
DistributedDataParallel                                                --
├─Model: 1-1                                                           --
│    └─EncodecModel: 2-1                                               --
│    │    └─EncodecEncoder: 3-1                                        (7,425,792)
│    │    └─EncodecDecoder: 3-2                                        (7,426,018)
│    │    └─EncodecResidualVectorQuantizer: 3-3                        --
│    └─TokenEmbedding: 2-2                                             --
│    │    └─Dropout: 3-4                                               --
│    │    └─Embedding: 3-5                                             524,800
│    └─Identity: 2-3                                                   --
│    └─SinePositionalEmbedding: 2-4                                    1
│    │    └─Dropout: 3-6                                               --
│    └─TransformerEncoder: 2-5                                         --
│    │    └─ModuleList: 3-7                                            37,828,608
│    │    └─LayerNorm: 3-8                                             1,024
│    └─Linear: 2-6                                                     524,800
│    └─MulticlassAccuracy: 2-7                                         --
│    └─TokenEmbedding: 2-8                                             --
│    │    └─Dropout: 3-9                                               --
│    │    └─Embedding: 3-10                                            524,288
│    └─ModuleList: 2-9                                                 --
│    │    └─TokenEmbedding: 3-11                                       524,800
│    │    └─TokenEmbedding: 3-12                                       524,288
│    │    └─TokenEmbedding: 3-13                                       524,288
│    │    └─TokenEmbedding: 3-14                                       524,288
│    │    └─TokenEmbedding: 3-15                                       524,288
│    │    └─TokenEmbedding: 3-16                                       524,288
│    │    └─TokenEmbedding: 3-17                                       524,288
│    │    └─TokenEmbedding: 3-18                                       524,288
│    └─Identity: 2-10                                                  --
│    └─SinePositionalEmbedding: 2-11                                   1
│    │    └─Dropout: 3-19                                              --
│    └─TransformerEncoder: 2-12                                        --
│    │    └─ModuleList: 3-20                                           50,436,096
│    │    └─AdaptiveLayerNorm: 3-21                                    526,336
│    └─ModuleList: 2-13                                                --
│    │    └─Linear: 3-22                                               524,288
│    │    └─Linear: 3-23                                               524,288
│    │    └─Linear: 3-24                                               524,288
│    │    └─Linear: 3-25                                               524,288
│    │    └─Linear: 3-26                                               524,288
│    │    └─Linear: 3-27                                               524,288
│    │    └─Linear: 3-28                                               524,288
│    └─ModuleList: 2-14                                                --
│    │    └─TokenEmbedding: 3-29                                       512
│    │    └─TokenEmbedding: 3-30                                       512
│    │    └─TokenEmbedding: 3-31                                       512
│    │    └─TokenEmbedding: 3-32                                       512
│    │    └─TokenEmbedding: 3-33                                       512
│    │    └─TokenEmbedding: 3-34                                       512
│    │    └─TokenEmbedding: 3-35                                       512
│    └─MulticlassAccuracy: 2-15                                        --
===============================================================================================
Total params: 113,086,180
Trainable params: 98,234,369
Non-trainable params: 14,851,811
===============================================================================================
01-18 13:14:45 INFO     [logging.py:61]: Training control variables:
01-18 13:14:45 INFO     [logging.py:61]: `steps_per_epoch`: 500
01-18 13:14:45 INFO     [logging.py:61]: Gradient accumulation steps: 1
01-18 13:14:45 INFO     [logging.py:61]: `update_steps_per_epoch`: 500
01-18 13:14:45 INFO     [logging.py:61]: `max_steps`: 500000
01-18 13:14:45 INFO     [logging.py:61]: `max_epochs`: 1000
01-18 13:14:45 INFO     [logging.py:61]: warmup_steps=1000. warmup_ratio will be ignored.
01-18 13:14:45 INFO     [logging.py:61]: Loading states from /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR/checkpoints/epoch_0050
01-18 13:14:45 INFO     [logging.py:61]: All model weights loaded successfully
01-18 13:14:46 INFO     [logging.py:61]: All optimizer states loaded successfully
01-18 13:14:46 INFO     [logging.py:61]: All scheduler states loaded successfully
01-18 13:14:46 INFO     [logging.py:61]: All dataloader sampler states loaded successfully
01-18 13:14:46 INFO     [logging.py:61]: All random states loaded successfully
01-18 13:14:46 INFO     [logging.py:61]: Loading in 1 custom states
01-18 13:14:46 INFO     [logging.py:61]: Loading the state of TrainerState from /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR/checkpoints/epoch_0050/custom_checkpoint_0.pkl
01-18 13:14:46 INFO     [logging.py:61]: Checkpoint on epoch 50 is loaded.
01-18 13:14:46 INFO     [logging.py:61]: ========= Epoch 51 out of 1000 =========
01-18 13:14:46 INFO     [logging.py:61]: Begin training...
01-18 13:29:20 INFO     [logging.py:61]: Loss 'loss' on epoch 51: 5.497116565704346
01-18 13:29:20 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 51: 1.253153920173645
01-18 13:29:20 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 51: 4.243962287902832
01-18 13:29:20 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 51: 0.9091926217079163
01-18 13:29:20 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 51: 0.40614935755729675
01-18 13:29:20 INFO     [logging.py:61]: ========= Epoch 52 out of 1000 =========
01-18 13:29:20 INFO     [logging.py:61]: Begin training...
01-18 13:43:53 INFO     [logging.py:61]: Loss 'loss' on epoch 52: 5.452788829803467
01-18 13:43:53 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 52: 1.245349645614624
01-18 13:43:53 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 52: 4.207438945770264
01-18 13:43:53 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 52: 0.9099061489105225
01-18 13:43:53 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 52: 0.41280823945999146
01-18 13:43:53 INFO     [logging.py:61]: ========= Epoch 53 out of 1000 =========
01-18 13:43:53 INFO     [logging.py:61]: Begin training...
01-18 13:58:30 INFO     [logging.py:61]: Loss 'loss' on epoch 53: 5.5037336349487305
01-18 13:58:30 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 53: 1.2453527450561523
01-18 13:58:30 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 53: 4.25838041305542
01-18 13:58:30 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 53: 0.9102101922035217
01-18 13:58:30 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 53: 0.40620309114456177
01-18 13:58:30 INFO     [logging.py:61]: ========= Epoch 54 out of 1000 =========
01-18 13:58:30 INFO     [logging.py:61]: Begin training...
01-18 14:13:06 INFO     [logging.py:61]: Loss 'loss' on epoch 54: 5.462146282196045
01-18 14:13:06 INFO     [logging.py:61]: Loss 'ar_loss' on epoch 54: 1.2472673654556274
01-18 14:13:06 INFO     [logging.py:61]: Loss 'nar_loss' on epoch 54: 4.214879512786865
01-18 14:13:06 INFO     [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 54: 0.9099826216697693
01-18 14:13:06 INFO     [logging.py:61]: Loss 'nar_acc_metric' on epoch 54: 0.4112021327018738
01-18 14:13:06 INFO     [logging.py:61]: ========= Epoch 55 out of 1000 =========
01-18 14:13:06 INFO     [logging.py:61]: Begin training...