2021-07-17 17:19:53.999791: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory [17:19:55] - WARNING - __main__ - Process rank: -1, device: cpu, n_gpu: 0distributed training: False, 16-bits training: False [17:19:55] - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=0, adafactor=False, adam_beta1=0.9, adam_beta2=0.98, adam_epsilon=1e-06, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=False, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_steps=1000, evaluation_strategy=IntervalStrategy.NO, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gradient_accumulation_steps=1, greater_is_better=None, group_by_length=False, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0006, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=./outputs/runs/Jul17_17-19-55_tablespoon, logging_first_step=False, logging_steps=500, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=3.0, output_dir=./outputs, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=48, per_device_train_batch_size=48, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=outputs, push_to_hub_organization=None, push_to_hub_token=None, remove_unused_columns=True, report_to=['tensorboard', 'wandb'], resume_from_checkpoint=None, run_name=./outputs, save_on_each_node=False, save_steps=1000, save_strategy=IntervalStrategy.STEPS, save_total_limit=5, seed=42, sharded_ddp=[], skip_memory_metrics=True, tpu_metrics_debug=False, tpu_num_cores=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=24000, weight_decay=0.01, ) [17:19:55] - INFO - absl - Starting the local TPU driver. [17:19:55] - INFO - absl - Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local:// [17:19:55] - INFO - absl - Unable to initialize backend 'gpu': Not found: Could not find registered platform with name: "cuda". Available platform names are: Host Interpreter TPU wandb: Currently logged in as: versae (use `wandb login --relogin` to force relogin) wandb: wandb version 0.11.0 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade 2021-07-17 17:20:01.466005: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory wandb: Tracking run with wandb version 0.10.33 wandb: Syncing run happy-sunset-38 wandb: View project at https://wandb.ai/wandb/hf-flax-bertin-roberta-es wandb: View run at https://wandb.ai/wandb/hf-flax-bertin-roberta-es/runs/372cgkt6 wandb: Run data is saved locally in /var/hf/experiment-base-exp-512seq-random/wandb/run-20210717_172000-372cgkt6 wandb: Run `wandb offline` to turn off syncing. 2021-07-17 17:20:02.551224: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2021-07-17 17:20:02.551257: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303) [17:20:08] - INFO - __main__ - Restoring checkpoint from ./outputs/checkpoints/checkpoint-230001/ /var/hf/venv/lib/python3.8/site-packages/jax/lib/xla_bridge.py:386: UserWarning: jax.host_count has been renamed to jax.process_count. This alias will eventually be removed; please update your code. warnings.warn( /var/hf/venv/lib/python3.8/site-packages/jax/lib/xla_bridge.py:373: UserWarning: jax.host_id has been renamed to jax.process_index. This alias will eventually be removed; please update your code. warnings.warn( [17:20:09] - INFO - datasets_modules.datasets.mc4.a87e65ba98565cd4c0dfa086d00f008f69e6581a8ebefe16caa66f7ac364637d.mc4 - generating examples from = ../mc4-es-train-50M-random.jsonl Training...: 0%| | 0/250000 [00:00