dat
Saving weights and logs at step 5
d725b93
[21:24:05] - INFO - absl - A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
/home/dat/pino/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py:3132: UserWarning: Explicitly requested dtype <class 'jax._src.numpy.lax_numpy.int64'> requested in zeros is not available, and will be truncated to dtype int32. To enable more dtypes, set the jax_enable_x64 configuration option or the JAX_ENABLE_X64 shell environment variable. See https://github.com/google/jax#current-gotchas for more.
lax._check_user_dtype_supported(dtype, "zeros")
/home/dat/pino/lib/python3.8/site-packages/jax/lib/xla_bridge.py:386: UserWarning: jax.host_count has been renamed to jax.process_count. This alias will eventually be removed; please update your code.
warnings.warn(
/home/dat/pino/lib/python3.8/site-packages/jax/lib/xla_bridge.py:373: UserWarning: jax.host_id has been renamed to jax.process_index. This alias will eventually be removed; please update your code.
warnings.warn(
Epoch ... (1/5): 0%| | 0/5 [00:00<?, ?it/s][21:24:06] - INFO - __main__ - Skipping to epoch 0 step 0
Training...: 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 253/1250 [04:26<1:03:55, 3.85s/it]
Training...: 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 500/1250 [07:26<09:01, 1.39it/s]
Evaluating ...: 0%| | 0/31 [00:00<?, ?it/s]
[21:32:05] - INFO - huggingface_hub.repository - git version 2.25.1β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 31/31 [00:21<00:00, 9.98it/s]
git-lfs/2.9.2 (GitHub; linux amd64; go 1.13.5)
[21:32:05] - DEBUG - huggingface_hub.repository - [Repository] is a valid git repo
[21:32:35] - INFO - huggingface_hub.repository - Uploading LFS objects: 100% (2/2), 510 MB | 31 MB/s, done.
tcmalloc: large alloc 1354776576 bytes == 0x304b28000 @ 0x7fd74488d680 0x7fd7448adbdd 0x7fd478ac320d 0x7fd478ad1340 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478accbd3 0x7fd478acd1fe 0x504d56 0x56acb6 0x568d9a 0x5f5b33 0x56bc9b 0x5f5956 0x56aadf 0x5f5956 0x56aadf 0x568d9a 0x5f5b33 0x56bc9b 0x568d9a 0x68cdc7 0x67e161
tcmalloc: large alloc 2715181056 bytes == 0x35572c000 @ 0x7fd74488d680 0x7fd7448adbdd 0x7fd478ac320d 0x7fd478ad1340 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478ad0e87 0x7fd478accbd3 0x7fd478acd1fe 0x504d56 0x56acb6 0x568d9a 0x5f5b33 0x56bc9b 0x5f5956 0x56aadf 0x5f5956 0x56aadf 0x568d9a 0x5f5b33 0x56bc9b 0x568d9a 0x68cdc7 0x67e161 0x67e1df
tcmalloc: large alloc 1530273792 bytes == 0x2ae462000 @ 0x7fd74488d680 0x7fd7448ae824 0x5f7b11 0x7fd478accc6f 0x7fd478acd1fe 0x504d56 0x56acb6 0x568d9a 0x5f5b33 0x56bc9b 0x5f5956 0x56aadf 0x5f5956 0x56aadf 0x568d9a 0x5f5b33 0x56bc9b 0x568d9a 0x68cdc7 0x67e161 0x67e1df 0x67e281 0x67e627 0x6b6e62 0x6b71ed 0x7fd7446a20b3 0x5f96de
[21:32:57] - INFO - __main__ - checkpoint saved
Training...: 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 500/1250 [08:46<13:09, 1.05s/it]
Step... (500 | Loss: 10.108721733093262, Acc: 0.043713752180337906): 0%| | 0/5 [08:51<?, ?it/s]
Traceback (most recent call last):
File "./run_mlm_flax.py", line 853, in <module>
rotate_checkpoints(training_args.output_dir, training_args.save_total_limit)
NameError: name 'rotate_checkpoints' is not defined