01-18 13:14:39 INFO [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR. 01-18 13:14:39 INFO [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR. 01-18 13:14:39 INFO [logger.py:80]: Initialized logger with log file in /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR. 01-18 13:14:45 INFO [logging.py:61]: Configuration file is saved to /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR/config__2024_01_18--13_14_43.toml. 01-18 13:14:45 INFO [logging.py:61]: Environment information: - `Accelerate` version: 0.26.1 - Platform: Linux-5.14.0-362.13.1.el9_3.x86_64-x86_64-with-glibc2.34 - Python version: 3.10.13 - Numpy version: 1.26.3 - PyTorch version (GPU?): 2.1.2 (True) - System RAM: 503.48 GB - GPU Available: True - GPU IDs: 4 - GPU type: NVIDIA A100-SXM4-80GB 01-18 13:14:45 INFO [logging.py:61]: =============================================================================================== Layer (type:depth-idx) Param # =============================================================================================== DistributedDataParallel -- ├─Model: 1-1 -- │ └─EncodecModel: 2-1 -- │ │ └─EncodecEncoder: 3-1 (7,425,792) │ │ └─EncodecDecoder: 3-2 (7,426,018) │ │ └─EncodecResidualVectorQuantizer: 3-3 -- │ └─TokenEmbedding: 2-2 -- │ │ └─Dropout: 3-4 -- │ │ └─Embedding: 3-5 524,800 │ └─Identity: 2-3 -- │ └─SinePositionalEmbedding: 2-4 1 │ │ └─Dropout: 3-6 -- │ └─TransformerEncoder: 2-5 -- │ │ └─ModuleList: 3-7 37,828,608 │ │ └─LayerNorm: 3-8 1,024 │ └─Linear: 2-6 524,800 │ └─MulticlassAccuracy: 2-7 -- │ └─TokenEmbedding: 2-8 -- │ │ └─Dropout: 3-9 -- │ │ └─Embedding: 3-10 524,288 │ └─ModuleList: 2-9 -- │ │ └─TokenEmbedding: 3-11 524,800 │ │ └─TokenEmbedding: 3-12 524,288 │ │ └─TokenEmbedding: 3-13 524,288 │ │ └─TokenEmbedding: 3-14 524,288 │ │ └─TokenEmbedding: 3-15 524,288 │ │ └─TokenEmbedding: 3-16 524,288 │ │ └─TokenEmbedding: 3-17 524,288 │ │ └─TokenEmbedding: 3-18 524,288 │ └─Identity: 2-10 -- │ └─SinePositionalEmbedding: 2-11 1 │ │ └─Dropout: 3-19 -- │ └─TransformerEncoder: 2-12 -- │ │ └─ModuleList: 3-20 50,436,096 │ │ └─AdaptiveLayerNorm: 3-21 526,336 │ └─ModuleList: 2-13 -- │ │ └─Linear: 3-22 524,288 │ │ └─Linear: 3-23 524,288 │ │ └─Linear: 3-24 524,288 │ │ └─Linear: 3-25 524,288 │ │ └─Linear: 3-26 524,288 │ │ └─Linear: 3-27 524,288 │ │ └─Linear: 3-28 524,288 │ └─ModuleList: 2-14 -- │ │ └─TokenEmbedding: 3-29 512 │ │ └─TokenEmbedding: 3-30 512 │ │ └─TokenEmbedding: 3-31 512 │ │ └─TokenEmbedding: 3-32 512 │ │ └─TokenEmbedding: 3-33 512 │ │ └─TokenEmbedding: 3-34 512 │ │ └─TokenEmbedding: 3-35 512 │ └─MulticlassAccuracy: 2-15 -- =============================================================================================== Total params: 113,086,180 Trainable params: 98,234,369 Non-trainable params: 14,851,811 =============================================================================================== 01-18 13:14:45 INFO [logging.py:61]: Training control variables: 01-18 13:14:45 INFO [logging.py:61]: `steps_per_epoch`: 500 01-18 13:14:45 INFO [logging.py:61]: Gradient accumulation steps: 1 01-18 13:14:45 INFO [logging.py:61]: `update_steps_per_epoch`: 500 01-18 13:14:45 INFO [logging.py:61]: `max_steps`: 500000 01-18 13:14:45 INFO [logging.py:61]: `max_epochs`: 1000 01-18 13:14:45 INFO [logging.py:61]: warmup_steps=1000. warmup_ratio will be ignored. 01-18 13:14:45 INFO [logging.py:61]: Loading states from /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR/checkpoints/epoch_0050 01-18 13:14:45 INFO [logging.py:61]: All model weights loaded successfully 01-18 13:14:46 INFO [logging.py:61]: All optimizer states loaded successfully 01-18 13:14:46 INFO [logging.py:61]: All scheduler states loaded successfully 01-18 13:14:46 INFO [logging.py:61]: All dataloader sampler states loaded successfully 01-18 13:14:46 INFO [logging.py:61]: All random states loaded successfully 01-18 13:14:46 INFO [logging.py:61]: Loading in 1 custom states 01-18 13:14:46 INFO [logging.py:61]: Loading the state of TrainerState from /fred/oz325/xhao/proj/audiozen/recipes/librimix_sot/tokenizer_separation/exp/swin_default_LR1e-3_AR-NAR/checkpoints/epoch_0050/custom_checkpoint_0.pkl 01-18 13:14:46 INFO [logging.py:61]: Checkpoint on epoch 50 is loaded. 01-18 13:14:46 INFO [logging.py:61]: ========= Epoch 51 out of 1000 ========= 01-18 13:14:46 INFO [logging.py:61]: Begin training... 01-18 13:29:20 INFO [logging.py:61]: Loss 'loss' on epoch 51: 5.497116565704346 01-18 13:29:20 INFO [logging.py:61]: Loss 'ar_loss' on epoch 51: 1.253153920173645 01-18 13:29:20 INFO [logging.py:61]: Loss 'nar_loss' on epoch 51: 4.243962287902832 01-18 13:29:20 INFO [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 51: 0.9091926217079163 01-18 13:29:20 INFO [logging.py:61]: Loss 'nar_acc_metric' on epoch 51: 0.40614935755729675 01-18 13:29:20 INFO [logging.py:61]: ========= Epoch 52 out of 1000 ========= 01-18 13:29:20 INFO [logging.py:61]: Begin training... 01-18 13:43:53 INFO [logging.py:61]: Loss 'loss' on epoch 52: 5.452788829803467 01-18 13:43:53 INFO [logging.py:61]: Loss 'ar_loss' on epoch 52: 1.245349645614624 01-18 13:43:53 INFO [logging.py:61]: Loss 'nar_loss' on epoch 52: 4.207438945770264 01-18 13:43:53 INFO [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 52: 0.9099061489105225 01-18 13:43:53 INFO [logging.py:61]: Loss 'nar_acc_metric' on epoch 52: 0.41280823945999146 01-18 13:43:53 INFO [logging.py:61]: ========= Epoch 53 out of 1000 ========= 01-18 13:43:53 INFO [logging.py:61]: Begin training... 01-18 13:58:30 INFO [logging.py:61]: Loss 'loss' on epoch 53: 5.5037336349487305 01-18 13:58:30 INFO [logging.py:61]: Loss 'ar_loss' on epoch 53: 1.2453527450561523 01-18 13:58:30 INFO [logging.py:61]: Loss 'nar_loss' on epoch 53: 4.25838041305542 01-18 13:58:30 INFO [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 53: 0.9102101922035217 01-18 13:58:30 INFO [logging.py:61]: Loss 'nar_acc_metric' on epoch 53: 0.40620309114456177 01-18 13:58:30 INFO [logging.py:61]: ========= Epoch 54 out of 1000 ========= 01-18 13:58:30 INFO [logging.py:61]: Begin training... 01-18 14:13:06 INFO [logging.py:61]: Loss 'loss' on epoch 54: 5.462146282196045 01-18 14:13:06 INFO [logging.py:61]: Loss 'ar_loss' on epoch 54: 1.2472673654556274 01-18 14:13:06 INFO [logging.py:61]: Loss 'nar_loss' on epoch 54: 4.214879512786865 01-18 14:13:06 INFO [logging.py:61]: Loss 'ar_accuracy_metric' on epoch 54: 0.9099826216697693 01-18 14:13:06 INFO [logging.py:61]: Loss 'nar_acc_metric' on epoch 54: 0.4112021327018738 01-18 14:13:06 INFO [logging.py:61]: ========= Epoch 55 out of 1000 ========= 01-18 14:13:06 INFO [logging.py:61]: Begin training...