2021-07-18 11:13:56.299414: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory [11:13:57] - WARNING - __main__ - Process rank: -1, device: cpu, n_gpu: 0distributed training: False, 16-bits training: False [11:13:57] - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=0, adafactor=False, adam_beta1=0.9, adam_beta2=0.98, adam_epsilon=1e-06, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=False, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_steps=1000, evaluation_strategy=IntervalStrategy.NO, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gradient_accumulation_steps=1, greater_is_better=None, group_by_length=False, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0006, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=./outputs/runs/Jul18_11-13-57_tablespoon, logging_first_step=False, logging_steps=500, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=3.0, output_dir=./outputs, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=48, per_device_train_batch_size=48, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=outputs, push_to_hub_organization=None, push_to_hub_token=None, remove_unused_columns=True, report_to=['tensorboard', 'wandb'], resume_from_checkpoint=None, run_name=./outputs, save_on_each_node=False, save_steps=1000, save_strategy=IntervalStrategy.STEPS, save_total_limit=5, seed=42, sharded_ddp=[], skip_memory_metrics=True, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.01, ) Downloading: 0%| | 0.00/4.30k [00:00 samples = advance_iter_and_group_samples(training_iter, train_batch_size, max_seq_length) File "./run_mlm_flax_stream.py", line 307, in advance_iter_and_group_samples tokenized_samples = next(train_iterator) File "/var/hf/datasets/src/datasets/iterable_dataset.py", line 338, in __iter__ for key, example in self._iter(): File "/var/hf/datasets/src/datasets/iterable_dataset.py", line 335, in _iter yield from ex_iterable File "/var/hf/datasets/src/datasets/iterable_dataset.py", line 222, in __iter__ for x in self.ex_iterable: File "/var/hf/datasets/src/datasets/iterable_dataset.py", line 176, in __iter__ for key, example in iterator: File "/var/hf/datasets/src/datasets/iterable_dataset.py", line 99, in __iter__ for key, example in self.generate_examples_fn(**kwargs_with_shuffled_shards): File "/home/versae/.cache/huggingface/modules/datasets_modules/datasets/mc4-es-sampled/d1c0a78c0461592510b4c54a52e6b8c6a8c4f08a3533821817af4ac2391c1a5f/mc4-es-sampled.py", line 123, in _generate_examples for line in f: File "/usr/lib/python3.8/gzip.py", line 305, in read1 return self._buffer.read1(size) File "/usr/lib/python3.8/_compression.py", line 68, in readinto data = self.read(len(byte_view)) File "/usr/lib/python3.8/gzip.py", line 485, in read buf = self._fp.read(io.DEFAULT_BUFFER_SIZE) File "/usr/lib/python3.8/gzip.py", line 96, in read self.file.read(size-self._length+read) File "/var/hf/datasets/src/datasets/utils/streaming_download_manager.py", line 62, in read_with_retries raise ConnectionError("Server Disconnected") ConnectionError: Server Disconnected wandb: Waiting for W&B process to finish, PID 3624382 wandb: Program failed with code 1. Press ctrl-c to abort syncing. wandb: - 3.42MB of 3.42MB uploaded (0.00MB deduped) wandb: \ 3.42MB of 3.42MB uploaded (0.00MB deduped) wandb: | 3.42MB of 3.42MB uploaded (0.00MB deduped) wandb: / 3.42MB of 3.46MB uploaded (0.00MB deduped) wandb: - 0.03MB of 3.53MB uploaded (0.00MB deduped) wandb: \ 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: | 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: / 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: - 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: \ 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: | 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: / 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: - 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: \ 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: | 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: / 3.53MB of 3.53MB uploaded (0.00MB deduped) wandb: wandb: Find user logs for this run at: /var/hf/experiment-base-exp-512seq-gaussian/wandb/run-20210718_111405-332f6sie/logs/debug.log wandb: Find internal logs for this run at: /var/hf/experiment-base-exp-512seq-gaussian/wandb/run-20210718_111405-332f6sie/logs/debug-internal.log wandb: Run summary: wandb: global_step 24500 wandb: _timestamp 1626647982.91069 wandb: train_time 1190513.75 wandb: train_learning_rate 0.00031 wandb: _step 48854 wandb: train_loss 1.63295 wandb: eval_accuracy 0.67292 wandb: eval_loss 1.61718 wandb: Run history: wandb: global_step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███ wandb: _timestamp ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███ wandb: train_time ▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▇▇▇██ wandb: train_learning_rate ▁███▇▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁ wandb: _step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: train_loss █▁▁▁▂▁▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: eval_accuracy ▂▁▁▁▁▂▂▂▃▃▃▄▄▅▅▅▆▆▆▇▇███ wandb: eval_loss ▇████▇▇▇▆▆▆▅▅▄▄▄▃▃▃▂▂▁▂▁ wandb: wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 2 other file(s) wandb: wandb: Synced dauntless-moon-39: https://wandb.ai/wandb/hf-flax-bertin-roberta-es/runs/332f6sie