diff --git "a/nohup.out" "b/nohup.out" --- "a/nohup.out" +++ "b/nohup.out" @@ -66167,3 +66167,1606 @@ huggingface/tokenizers: The current process just got forked, after parallelism h To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... +To disable this warning, you can either: + - Avoid using `tokenizers` before the fork if possible + - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) +Step... (48325 | Loss: 0.006273908074945211, Learning Rate: 3.385859599802643e-06, Gradient Norm: 0.1460614949464798) +Step... (48350 | Loss: 0.0028254184871912003, Learning Rate: 3.335356723255245e-06, Gradient Norm: 0.10338709503412247) +Step... (48375 | Loss: 0.004019684158265591, Learning Rate: 3.28484770761861e-06, Gradient Norm: 0.1249128133058548) +Step... (48400 | Loss: 0.004758585710078478, Learning Rate: 3.2343446036975365e-06, Gradient Norm: 0.15959139168262482) +Step... (48425 | Loss: 0.0057290843687951565, Learning Rate: 3.1838417271501385e-06, Gradient Norm: 0.3134813904762268) +Step... (48450 | Loss: 0.0017834462923929095, Learning Rate: 3.1333327115135035e-06, Gradient Norm: 0.07752764225006104) +Step... (48475 | Loss: 0.00752327311784029, Learning Rate: 3.08282960759243e-06, Gradient Norm: 0.1572938710451126) +Step... (48500 | Loss: 0.0028460498433560133, Learning Rate: 3.032326731045032e-06, Gradient Norm: 0.14785446226596832) +Step... (48525 | Loss: 0.003165456233546138, Learning Rate: 2.981817715408397e-06, Gradient Norm: 0.14088474214076996) +Step... (48550 | Loss: 0.003088166005909443, Learning Rate: 2.9313146114873234e-06, Gradient Norm: 0.14618875086307526) +Step... (48575 | Loss: 0.005377086345106363, Learning Rate: 2.8808055958506884e-06, Gradient Norm: 0.1480293571949005) +Step... (48600 | Loss: 0.008808658458292484, Learning Rate: 2.8303027193032904e-06, Gradient Norm: 0.29225432872772217) +Step... (48625 | Loss: 0.005420982837677002, Learning Rate: 2.779799615382217e-06, Gradient Norm: 0.12026731669902802) +Step... (48650 | Loss: 0.0027397857047617435, Learning Rate: 2.729290599745582e-06, Gradient Norm: 0.15806540846824646) +Step... (48675 | Loss: 0.00710515258833766, Learning Rate: 2.678787723198184e-06, Gradient Norm: 0.22162267565727234) +Step... (48700 | Loss: 0.005460535641759634, Learning Rate: 2.6282846192771103e-06, Gradient Norm: 0.15932734310626984) +Step... (48725 | Loss: 0.0032200149726122618, Learning Rate: 2.5777756036404753e-06, Gradient Norm: 0.12232287973165512) +Step... (48750 | Loss: 0.003293083980679512, Learning Rate: 2.5272727270930773e-06, Gradient Norm: 0.1615389883518219) +Step... (48775 | Loss: 0.006437454838305712, Learning Rate: 2.4767696231720038e-06, Gradient Norm: 0.17736835777759552) +Step... (48800 | Loss: 0.003951036371290684, Learning Rate: 2.4262606075353688e-06, Gradient Norm: 0.14531250298023224) +Step... (48825 | Loss: 0.005099722649902105, Learning Rate: 2.3757577309879707e-06, Gradient Norm: 0.160662442445755) +Step... (48850 | Loss: 0.0032283207401633263, Learning Rate: 2.3252546270668972e-06, Gradient Norm: 0.13414400815963745) +Step... (48875 | Loss: 0.0015734140761196613, Learning Rate: 2.2747456114302622e-06, Gradient Norm: 0.07306690514087677) +Step... (48900 | Loss: 0.004713626578450203, Learning Rate: 2.224242734882864e-06, Gradient Norm: 0.37370821833610535) +Step... (48925 | Loss: 0.00445198779925704, Learning Rate: 2.1737396309617907e-06, Gradient Norm: 0.11672279238700867) +Step... (48950 | Loss: 0.005622401367872953, Learning Rate: 2.1232306153251557e-06, Gradient Norm: 0.36664897203445435) +Step... (48975 | Loss: 0.004830599296838045, Learning Rate: 2.0727277387777576e-06, Gradient Norm: 0.1332392543554306) +Step... (49000 | Loss: 0.001942885690368712, Learning Rate: 2.022224634856684e-06, Gradient Norm: 0.08187990635633469) +Step... (49025 | Loss: 0.002983431564643979, Learning Rate: 1.971715619220049e-06, Gradient Norm: 0.12874305248260498) +Step... (49050 | Loss: 0.003948990721255541, Learning Rate: 1.9212125152989756e-06, Gradient Norm: 0.1727469116449356) +Step... (49075 | Loss: 0.0030416282825171947, Learning Rate: 1.8707096387515776e-06, Gradient Norm: 0.11190405488014221) +Step... (49100 | Loss: 0.002342380816116929, Learning Rate: 1.8202006231149426e-06, Gradient Norm: 0.186001718044281) +Step... (49125 | Loss: 0.005597165320068598, Learning Rate: 1.7696976328807068e-06, Gradient Norm: 0.1563747525215149) +Step... (49150 | Loss: 0.006304830778390169, Learning Rate: 1.719194642646471e-06, Gradient Norm: 0.3608132302761078) +Step + + Evaluating ...: 0% 0/85 [00:00 + main() + File "run_flax_speech_recognition_seq2seq.py", line 1549, in main + error_rate_metric, pred_str, label_str = compute_metrics(pred_generations, pred_labels) + File "run_flax_speech_recognition_seq2seq.py", line 1064, in compute_metrics + label_str = tokenizer.batch_decode(padded_ids, skip_special_tokens=True) + File "/home/sanchitgandhi/transformers/src/transformers/tokenization_utils_base.py", line 3328, in batch_decode + return [ + File "/home/sanchitgandhi/transformers/src/transformers/tokenization_utils_base.py", line 3329, in + self.decode( + File "/home/sanchitgandhi/transformers/src/transformers/tokenization_utils_base.py", line 3367, in decode + return self._decode( + File "/home/sanchitgandhi/transformers/src/transformers/tokenization_utils_fast.py", line 548, in _decode + text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens) +OverflowError: out of range integral type conversion attempted +wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing. +wandb: - 15.011 MB of 15.011 MB uploaded (0.000 MB deduped) wandb: \ 15.011 MB of 15.011 MB uploaded (0.000 MB deduped) wandb: | 15.011 MB of 15.011 MB uploaded (0.000 MB deduped) wandb: / 15.011 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: - 15.011 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: \ 15.534 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: | 15.534 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: / 15.534 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: - 15.534 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: \ 15.534 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: | 15.534 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: / 15.534 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: - 15.534 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: \ 15.534 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: | 15.534 MB of 15.534 MB uploaded (0.000 MB deduped) wandb: +wandb: +wandb: Run history: +wandb: eval/cer █▆▅▁▃ +wandb: eval/loss ▁▇▇▆█ +wandb: eval/wer █▅▄▁▂ +wandb: test.clean/cer ▁ +wandb: test.clean/loss ▁ +wandb: test.clean/wer ▁ +wandb: train/decoder_grad_norm █▅▄▄▂▂▂▂▁▁▁▂▂▁▂▇▁▁▁▁▁▁▁▂▁▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ +wandb: train/decoder_param_norm ▂▃▁▁▁▂▂▃▃▃▄▄▅▅▅▆▆▆▆▇▇▇▇▇▇███████████████ +wandb: train/encoder_grad_norm ▃█▄▄▂▁▂▂▁▁▂▂▁▁▂▆▁▁▁▁▁▁▁▃▁▂▂▂▁▁▁▁▁▂▁▁▁▁▁▁ +wandb: train/encoder_param_norm ▁▂▂▃▃▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇████████████████ +wandb: train/grad_norm █▇▅▄▂▂▂▂▂▁▁▂▂▁▂█▁▁▁▁▁▁▁▂▁▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ +wandb: train/learning_rate ▇███▇▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁ +wandb: train/loss █▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ +wandb: train/param_norm ▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▆▇▇▇▇▇▇▇████████████████ +wandb: validation.other/cer ▁ +wandb: validation.other/loss ▁ +wandb: validation.other/wer ▁ +wandb: +wandb: Run summary: +wandb: eval/cer 0.03118 +wandb: eval/loss 1.06093 +wandb: eval/wer 0.04351 +wandb: test.clean/cer 0.03252 +wandb: test.clean/loss 0.46455 +wandb: test.clean/wer 0.0459 +wandb: train/decoder_grad_norm 0.13864 +wandb: train/decoder_param_norm 1063.15796 +wandb: train/encoder_grad_norm 0.12768 +wandb: train/encoder_param_norm 2323.48657 +wandb: train/grad_norm 0.18847 +wandb: train/learning_rate 0.0 +wandb: train/loss 0.00471 +wandb: train/param_norm 2555.17017 +wandb: validation.other/cer 0.04917 +wandb: validation.other/loss 1.31555 +wandb: validation.other/wer 0.07555 +wandb: +wandb: Synced flax-wav2vec2-2-bart-large-ls-960h-black-box: https://wandb.ai/sanchit-gandhi/librispeech_960h/runs/2hx8pk65 +wandb: Synced 5 W&B file(s), 13 media file(s), 13 artifact file(s) and 0 other file(s) +wandb: Find logs at: ./wandb/run-20220828_085247-2hx8pk65/logs