01-18 00:21:17 INFO [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-2_AR-NAR. 01-18 00:21:17 INFO [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-2_AR-NAR. 01-18 00:21:17 INFO [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-2_AR-NAR. 01-18 00:21:17 INFO [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-2_AR-NAR. 01-18 00:22:18 INFO [logging.py:61]: Configuration file is saved to /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-2_AR-NAR/config__2024_01_18--00_22_05.toml. 01-18 00:22:18 INFO [logging.py:61]: Environment information: - `Accelerate` version: 0.26.1 - Platform: Linux-5.14.0-362.13.1.el9_3.x86_64-x86_64-with-glibc2.34 - Python version: 3.10.13 - Numpy version: 1.26.3 - PyTorch version (GPU?): 2.1.2 (True) - System RAM: 503.48 GB - GPU Available: True - GPU IDs: 4 - GPU type: NVIDIA A100-SXM4-80GB 01-18 00:22:18 INFO [logging.py:61]: =============================================================================================== Layer (type:depth-idx) Param # =============================================================================================== DistributedDataParallel -- ├─Model: 1-1 -- │ └─EncodecModel: 2-1 -- │ │ └─EncodecEncoder: 3-1 (7,425,792) │ │ └─EncodecDecoder: 3-2 (7,426,018) │ │ └─EncodecResidualVectorQuantizer: 3-3 -- │ └─TokenEmbedding: 2-2 -- │ │ └─Dropout: 3-4 -- │ │ └─Embedding: 3-5 1,049,600 │ └─Identity: 2-3 -- │ └─SinePositionalEmbedding: 2-4 1 │ │ └─Dropout: 3-6 -- │ └─TransformerEncoder: 2-5 -- │ │ └─ModuleList: 3-7 151,154,688 │ │ └─LayerNorm: 3-8 2,048 │ └─Linear: 2-6 1,049,600 │ └─MulticlassAccuracy: 2-7 -- │ └─TokenEmbedding: 2-8 -- │ │ └─Dropout: 3-9 -- │ │ └─Embedding: 3-10 1,048,576 │ └─ModuleList: 2-9 -- │ │ └─TokenEmbedding: 3-11 1,049,600 │ │ └─TokenEmbedding: 3-12 1,048,576 │ │ └─TokenEmbedding: 3-13 1,048,576 │ │ └─TokenEmbedding: 3-14 1,048,576 │ │ └─TokenEmbedding: 3-15 1,048,576 │ │ └─TokenEmbedding: 3-16 1,048,576 │ │ └─TokenEmbedding: 3-17 1,048,576 │ │ └─TokenEmbedding: 3-18 1,048,576 │ └─Identity: 2-10 -- │ └─SinePositionalEmbedding: 2-11 1 │ │ └─Dropout: 3-19 -- │ └─TransformerEncoder: 2-12 -- │ │ └─ModuleList: 3-20 201,535,488 │ │ └─AdaptiveLayerNorm: 3-21 2,101,248 │ └─ModuleList: 2-13 -- │ │ └─Linear: 3-22 1,048,576 │ │ └─Linear: 3-23 1,048,576 │ │ └─Linear: 3-24 1,048,576 │ │ └─Linear: 3-25 1,048,576 │ │ └─Linear: 3-26 1,048,576 │ │ └─Linear: 3-27 1,048,576 │ │ └─Linear: 3-28 1,048,576 │ └─ModuleList: 2-14 -- │ │ └─TokenEmbedding: 3-29 1,024 │ │ └─TokenEmbedding: 3-30 1,024 │ │ └─TokenEmbedding: 3-31 1,024 │ │ └─TokenEmbedding: 3-32 1,024 │ │ └─TokenEmbedding: 3-33 1,024 │ │ └─TokenEmbedding: 3-34 1,024 │ │ └─TokenEmbedding: 3-35 1,024 │ └─MulticlassAccuracy: 2-15 -- =============================================================================================== Total params: 388,529,892 Trainable params: 373,678,081 Non-trainable params: 14,851,811 =============================================================================================== 01-18 00:22:18 INFO [logging.py:61]: Training control variables: 01-18 00:22:18 INFO [logging.py:61]: `steps_per_epoch`: 500 01-18 00:22:18 INFO [logging.py:61]: Gradient accumulation steps: 1 01-18 00:22:18 INFO [logging.py:61]: `update_steps_per_epoch`: 500 01-18 00:22:18 INFO [logging.py:61]: `max_steps`: 500000 01-18 00:22:18 INFO [logging.py:61]: `max_epochs`: 1000 01-18 00:22:18 INFO [logging.py:61]: warmup_steps=1000. warmup_ratio will be ignored. 01-18 00:22:18 INFO [logging.py:61]: ========= Epoch 1 out of 1000 ========= 01-18 00:22:18 INFO [logging.py:61]: Begin training...