The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `2` More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in `--num_processes=1`. `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. [2023-12-19 16:35:12,389] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-19 16:35:12,395] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) /root/miniconda3/envs/textgen/lib/python3.10/site-packages/trl/trainer/ppo_config.py:141: UserWarning: The `optimize_cuda_cache` arguement will be deprecated soon, please use `optimize_device_cache` instead. warnings.warn( /root/miniconda3/envs/textgen/lib/python3.10/site-packages/trl/trainer/ppo_config.py:141: UserWarning: The `optimize_cuda_cache` arguement will be deprecated soon, please use `optimize_device_cache` instead. warnings.warn( 12/19/2023 16:35:17 - WARNING - llmtuner.model.parser - We recommend enable `upcast_layernorm` in quantized training. 12/19/2023 16:35:17 - WARNING - llmtuner.model.parser - We recommend enable mixed precision training. 12/19/2023 16:35:17 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. /root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead. warnings.warn( 12/19/2023 16:35:17 - INFO - llmtuner.model.parser - Process rank: 1, device: cuda:1, n_gpu: 1 distributed training: True, compute dtype: None 12/19/2023 16:35:17 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=200, evaluation_strategy=steps, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=8, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=./models/sft/phi-2-sft-alpaca_gpt4_en-1/runs/Dec19_16-35-17_autodl-container-f11a41911a-e496153c, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=200, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=adamw_torch, optim_args=None, output_dir=./models/sft/phi-2-sft-alpaca_gpt4_en-1, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, predict_with_generate=False, prediction_loss_only=True, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./models/sft/phi-2-sft-alpaca_gpt4_en-1, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, skip_memory_metrics=True, sortish_sampler=False, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/19/2023 16:35:17 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_en.json... 12/19/2023 16:35:17 - WARNING - llmtuner.model.parser - We recommend enable `upcast_layernorm` in quantized training. 12/19/2023 16:35:17 - WARNING - llmtuner.model.parser - We recommend enable mixed precision training. 12/19/2023 16:35:17 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. /root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead. warnings.warn( 12/19/2023 16:35:17 - INFO - llmtuner.model.parser - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, compute dtype: None 12/19/2023 16:35:17 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=200, evaluation_strategy=steps, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=8, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=./models/sft/phi-2-sft-alpaca_gpt4_en-1/runs/Dec19_16-35-17_autodl-container-f11a41911a-e496153c, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=200, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=adamw_torch, optim_args=None, output_dir=./models/sft/phi-2-sft-alpaca_gpt4_en-1, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, predict_with_generate=False, prediction_loss_only=True, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./models/sft/phi-2-sft-alpaca_gpt4_en-1, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, skip_memory_metrics=True, sortish_sampler=False, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/19/2023 16:35:17 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_en.json... [WARNING|logging.py:314] 2023-12-19 16:35:18,566 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 12/19/2023 16:35:18 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 12/19/2023 16:35:18 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit. Loading checkpoint shards: 0%| | 0/2 [00:00 12/19/2023 16:35:20 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA Running tokenizer on dataset: 0%| | 0/52002 [00:00 Running tokenizer on dataset: 2%|▏ | 1000/52002 [00:00<00:26, 1893.60 examples/s] Running tokenizer on dataset: 4%|▍ | 2000/52002 [00:01<00:26, 1911.90 examples/s] Running tokenizer on dataset: 6%|▌ | 3000/52002 [00:01<00:26, 1872.65 examples/s] Running tokenizer on dataset: 8%|▊ | 4000/52002 [00:02<00:26, 1796.55 examples/s] Running tokenizer on dataset: 10%|▉ | 5000/52002 [00:02<00:25, 1831.74 examples/s] Running tokenizer on dataset: 12%|█▏ | 6000/52002 [00:03<00:24, 1852.39 examples/s] Running tokenizer on dataset: 13%|█▎ | 7000/52002 [00:03<00:24, 1868.00 examples/s] Running tokenizer on dataset: 15%|█▌ | 8000/52002 [00:04<00:23, 1881.09 examples/s] Running tokenizer on dataset: 17%|█▋ | 9000/52002 [00:04<00:22, 1892.24 examples/s] Running tokenizer on dataset: 19%|█▉ | 10000/52002 [00:05<00:22, 1872.69 examples/s] Running tokenizer on dataset: 21%|██ | 11000/52002 [00:05<00:21, 1893.81 examples/s] Running tokenizer on dataset: 23%|██▎ | 12000/52002 [00:06<00:21, 1883.24 examples/s] Running tokenizer on dataset: 25%|██▍ | 13000/52002 [00:06<00:20, 1874.38 examples/s] Running tokenizer on dataset: 27%|██▋ | 14000/52002 [00:07<00:20, 1874.41 examples/s] Running tokenizer on dataset: 29%|██▉ | 15000/52002 [00:08<00:19, 1875.46 examples/s] Running tokenizer on dataset: 31%|███ | 16000/52002 [00:08<00:19, 1892.40 examples/s] Running tokenizer on dataset: 33%|███▎ | 17000/52002 [00:09<00:18, 1882.04 examples/s] Running tokenizer on dataset: 35%|███▍ | 18000/52002 [00:09<00:18, 1865.13 examples/s] Running tokenizer on dataset: 37%|███▋ | 19000/52002 [00:10<00:17, 1871.29 examples/s] Running tokenizer on dataset: 38%|███▊ | 20000/52002 [00:10<00:17, 1869.13 examples/s] Running tokenizer on dataset: 40%|████ | 21000/52002 [00:11<00:16, 1871.31 examples/s] Running tokenizer on dataset: 42%|████▏ | 22000/52002 [00:11<00:15, 1881.82 examples/s] Running tokenizer on dataset: 44%|████▍ | 23000/52002 [00:12<00:15, 1868.07 examples/s] Running tokenizer on dataset: 46%|████▌ | 24000/52002 [00:12<00:14, 1881.80 examples/s] Running tokenizer on dataset: 48%|████▊ | 25000/52002 [00:13<00:14, 1887.65 examples/s] Running tokenizer on dataset: 50%|████▉ | 26000/52002 [00:13<00:13, 1874.59 examples/s] Running tokenizer on dataset: 52%|█████▏ | 27000/52002 [00:14<00:13, 1873.43 examples/s] Running tokenizer on dataset: 54%|█████▍ | 28000/52002 [00:14<00:12, 1869.20 examples/s] Running tokenizer on dataset: 56%|█████▌ | 29000/52002 [00:15<00:12, 1849.32 examples/s] Running tokenizer on dataset: 58%|█████▊ | 30000/52002 [00:16<00:11, 1847.84 examples/s] Running tokenizer on dataset: 60%|█████▉ | 31000/52002 [00:16<00:11, 1837.60 examples/s] Running tokenizer on dataset: 62%|██████▏ | 32000/52002 [00:17<00:10, 1862.34 examples/s] Running tokenizer on dataset: 63%|██████▎ | 33000/52002 [00:17<00:10, 1851.55 examples/s] Running tokenizer on dataset: 65%|██████▌ | 34000/52002 [00:18<00:09, 1868.71 examples/s] Running tokenizer on dataset: 67%|██████▋ | 35000/52002 [00:18<00:09, 1875.20 examples/s] Running tokenizer on dataset: 69%|██████▉ | 36000/52002 [00:19<00:08, 1864.46 examples/s] Running tokenizer on dataset: 71%|███████ | 37000/52002 [00:19<00:08, 1864.99 examples/s] Running tokenizer on dataset: 73%|███████▎ | 38000/52002 [00:20<00:07, 1865.80 examples/s] Running tokenizer on dataset: 75%|███████▍ | 39000/52002 [00:20<00:07, 1853.02 examples/s] Running tokenizer on dataset: 77%|███████▋ | 40000/52002 [00:21<00:06, 1866.36 examples/s] Running tokenizer on dataset: 79%|███████▉ | 41000/52002 [00:21<00:05, 1875.38 examples/s] Running tokenizer on dataset: 81%|████████ | 42000/52002 [00:22<00:05, 1867.13 examples/s] Running tokenizer on dataset: 83%|████████▎ | 43000/52002 [00:23<00:04, 1845.75 examples/s] Running tokenizer on dataset: 85%|████████▍ | 44000/52002 [00:23<00:04, 1839.88 examples/s] Running tokenizer on dataset: 87%|████████▋ | 45000/52002 [00:24<00:03, 1848.38 examples/s] Running tokenizer on dataset: 88%|████████▊ | 46000/52002 [00:24<00:03, 1851.74 examples/s] Running tokenizer on dataset: 90%|█████████ | 47000/52002 [00:25<00:02, 1852.50 examples/s] Running tokenizer on dataset: 92%|█████████▏| 48000/52002 [00:25<00:02, 1851.31 examples/s] Running tokenizer on dataset: 94%|█████████▍| 49000/52002 [00:26<00:01, 1860.92 examples/s] Running tokenizer on dataset: 96%|█████████▌| 50000/52002 [00:26<00:01, 1865.85 examples/s] Running tokenizer on dataset: 98%|█████████▊| 51000/52002 [00:27<00:00, 1872.15 examples/s] Running tokenizer on dataset: 100%|█████████▉| 52000/52002 [00:27<00:00, 1876.02 examples/s] Running tokenizer on dataset: 100%|██████████| 52002/52002 [00:27<00:00, 1865.52 examples/s] input_ids: [50256, 32, 8537, 1022, 257, 11040, 2836, 290, 281, 11666, 4430, 8796, 13, 383, 8796, 3607, 7613, 11, 6496, 11, 290, 23507, 7429, 284, 262, 2836, 338, 2683, 13, 198, 20490, 25, 13786, 1115, 9040, 329, 10589, 5448, 13, 198, 48902, 25, 16, 13, 27574, 257, 12974, 290, 48102, 5496, 25, 6889, 1654, 534, 13840, 389, 19889, 286, 257, 4996, 286, 15921, 290, 13701, 11, 10904, 7532, 11, 2187, 21824, 11, 290, 5448, 27997, 13, 770, 5419, 284, 2148, 534, 1767, 351, 262, 6393, 20901, 284, 2163, 379, 663, 1266, 290, 460, 1037, 2948, 10726, 10040, 13, 198, 198, 17, 13, 1985, 496, 287, 3218, 3518, 3842, 25, 32900, 318, 8780, 329, 10941, 1913, 11945, 11, 12749, 11, 290, 21134, 1535, 13, 36223, 329, 379, 1551, 6640, 2431, 286, 10768, 43294, 5517, 393, 5441, 2431, 286, 31543, 5517, 1123, 1285, 13, 198, 198, 18, 13, 3497, 1576, 3993, 25, 18067, 1576, 3081, 3993, 318, 8780, 329, 3518, 290, 5110, 880, 12, 11873, 13, 632, 5419, 284, 16697, 10038, 11, 2987, 10870, 2163, 11, 290, 6971, 5448, 3349, 290, 10900, 2163, 13, 36223, 329, 767, 12, 24, 2250, 286, 3993, 1123, 1755, 13, 50256] inputs: <|endoftext|>A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: Give three tips for staying healthy. Assistant:1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases. 2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week. 3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.<|endoftext|> label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 16, 13, 27574, 257, 12974, 290, 48102, 5496, 25, 6889, 1654, 534, 13840, 389, 19889, 286, 257, 4996, 286, 15921, 290, 13701, 11, 10904, 7532, 11, 2187, 21824, 11, 290, 5448, 27997, 13, 770, 5419, 284, 2148, 534, 1767, 351, 262, 6393, 20901, 284, 2163, 379, 663, 1266, 290, 460, 1037, 2948, 10726, 10040, 13, 198, 198, 17, 13, 1985, 496, 287, 3218, 3518, 3842, 25, 32900, 318, 8780, 329, 10941, 1913, 11945, 11, 12749, 11, 290, 21134, 1535, 13, 36223, 329, 379, 1551, 6640, 2431, 286, 10768, 43294, 5517, 393, 5441, 2431, 286, 31543, 5517, 1123, 1285, 13, 198, 198, 18, 13, 3497, 1576, 3993, 25, 18067, 1576, 3081, 3993, 318, 8780, 329, 3518, 290, 5110, 880, 12, 11873, 13, 632, 5419, 284, 16697, 10038, 11, 2987, 10870, 2163, 11, 290, 6971, 5448, 3349, 290, 10900, 2163, 13, 36223, 329, 767, 12, 24, 2250, 286, 3993, 1123, 1755, 13, 50256] labels: 1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases. 2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week. 3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.<|endoftext|> Running tokenizer on dataset: 0%| | 0/52002 [00:00> You're using a CodeGenTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. 0%| | 1/2925 [00:01<59:58, 1.23s/it] 0%| | 2/2925 [00:02<50:18, 1.03s/it] 0%| | 3/2925 [00:02<45:50, 1.06it/s] 0%| | 4/2925 [00:03<44:40, 1.09it/s] 0%| | 5/2925 [00:04<44:51, 1.08it/s] 0%| | 6/2925 [00:05<43:36, 1.12it/s] 0%| | 7/2925 [00:06<42:58, 1.13it/s] 0%| | 8/2925 [00:07<42:24, 1.15it/s] 0%| | 9/2925 [00:08<42:02, 1.16it/s] 0%| | 10/2925 [00:09<41:58, 1.16it/s] 0%| | 11/2925 [00:09<41:34, 1.17it/s] 0%| | 12/2925 [00:10<41:41, 1.16it/s] 0%| | 13/2925 [00:11<41:28, 1.17it/s] 0%| | 14/2925 [00:12<41:19, 1.17it/s] 1%| | 15/2925 [00:13<41:18, 1.17it/s] 1%| | 16/2925 [00:14<41:05, 1.18it/s] 1%| | 17/2925 [00:15<42:30, 1.14it/s] 1%| | 18/2925 [00:15<42:43, 1.13it/s] 1%| | 19/2925 [00:16<42:07, 1.15it/s] 1%| | 20/2925 [00:17<42:08, 1.15it/s] 1%| | 21/2925 [00:18<41:41, 1.16it/s] 1%| | 22/2925 [00:19<41:36, 1.16it/s] 1%| | 23/2925 [00:20<41:18, 1.17it/s] 1%| | 24/2925 [00:21<41:15, 1.17it/s] 1%| | 25/2925 [00:21<40:57, 1.18it/s] 1%| | 26/2925 [00:22<40:48, 1.18it/s] 1%| | 27/2925 [00:23<40:50, 1.18it/s] 1%| | 28/2925 [00:24<40:49, 1.18it/s] 1%| | 29/2925 [00:25<40:52, 1.18it/s] 1%| | 30/2925 [00:26<40:58, 1.18it/s] 1%| | 31/2925 [00:27<41:37, 1.16it/s] 1%| | 32/2925 [00:27<41:17, 1.17it/s] 1%| | 33/2925 [00:28<40:58, 1.18it/s] 1%| | 34/2925 [00:29<40:49, 1.18it/s] 1%| | 35/2925 [00:30<40:51, 1.18it/s] 1%| | 36/2925 [00:31<40:48, 1.18it/s] 1%|▏ | 37/2925 [00:32<40:39, 1.18it/s] 1%|▏ | 38/2925 [00:32<40:39, 1.18it/s] 1%|▏ | 39/2925 [00:33<40:57, 1.17it/s] 1%|▏ | 40/2925 [00:34<40:55, 1.18it/s] 1%|▏ | 41/2925 [00:35<40:55, 1.17it/s] 1%|▏ | 42/2925 [00:36<40:44, 1.18it/s] 1%|▏ | 43/2925 [00:37<40:53, 1.17it/s] 2%|▏ | 44/2925 [00:38<40:41, 1.18it/s] 2%|▏ | 45/2925 [00:38<40:36, 1.18it/s] 2%|▏ | 46/2925 [00:39<40:34, 1.18it/s] 2%|▏ | 47/2925 [00:40<40:31, 1.18it/s] 2%|▏ | 48/2925 [00:41<40:35, 1.18it/s] 2%|▏ | 49/2925 [00:42<40:28, 1.18it/s] 2%|▏ | 50/2925 [00:43<40:52, 1.17it/s] 2%|▏ | 51/2925 [00:43<40:40, 1.18it/s] 2%|▏ | 52/2925 [00:44<41:04, 1.17it/s] 2%|▏ | 53/2925 [00:45<40:53, 1.17it/s] 2%|▏ | 54/2925 [00:46<40:45, 1.17it/s] 2%|▏ | 55/2925 [00:47<40:38, 1.18it/s] 2%|▏ | 56/2925 [00:48<40:46, 1.17it/s] 2%|▏ | 57/2925 [00:49<40:41, 1.17it/s] 2%|▏ | 58/2925 [00:50<42:00, 1.14it/s] 2%|▏ | 59/2925 [00:50<41:52, 1.14it/s] 2%|▏ | 60/2925 [00:51<41:26, 1.15it/s] 2%|▏ | 61/2925 [00:52<41:05, 1.16it/s] 2%|▏ | 62/2925 [00:53<40:54, 1.17it/s] 2%|▏ | 63/2925 [00:54<40:37, 1.17it/s] 2%|▏ | 64/2925 [00:55<40:34, 1.18it/s] 2%|▏ | 65/2925 [00:55<40:29, 1.18it/s] 2%|▏ | 66/2925 [00:56<40:27, 1.18it/s] 2%|▏ | 67/2925 [00:57<40:37, 1.17it/s] 2%|▏ | 68/2925 [00:58<40:38, 1.17it/s] 2%|▏ | 69/2925 [00:59<40:29, 1.18it/s] 2%|▏ | 70/2925 [01:00<40:17, 1.18it/s] 2%|▏ | 71/2925 [01:01<40:29, 1.17it/s] 2%|▏ | 72/2925 [01:01<40:20, 1.18it/s] 2%|▏ | 73/2925 [01:02<40:22, 1.18it/s] 3%|▎ | 74/2925 [01:03<40:26, 1.17it/s] 3%|▎ | 75/2925 [01:04<40:21, 1.18it/s] 3%|▎ | 76/2925 [01:05<40:09, 1.18it/s] 3%|▎ | 77/2925 [01:06<40:03, 1.19it/s] 3%|▎ | 78/2925 [01:06<40:05, 1.18it/s] 3%|▎ | 79/2925 [01:07<40:15, 1.18it/s] 3%|▎ | 80/2925 [01:08<40:06, 1.18it/s] 3%|▎ | 81/2925 [01:09<40:28, 1.17it/s] 3%|▎ | 82/2925 [01:10<40:20, 1.17it/s] 3%|▎ | 83/2925 [01:11<40:26, 1.17it/s] 3%|▎ | 84/2925 [01:12<40:15, 1.18it/s] 3%|▎ | 85/2925 [01:12<40:10, 1.18it/s] 3%|▎ | 86/2925 [01:13<40:07, 1.18it/s] 3%|▎ | 87/2925 [01:14<40:20, 1.17it/s] 3%|▎ | 88/2925 [01:15<40:05, 1.18it/s] 3%|▎ | 89/2925 [01:16<40:07, 1.18it/s] 3%|▎ | 90/2925 [01:17<39:59, 1.18it/s] 3%|▎ | 91/2925 [01:18<39:55, 1.18it/s] 3%|▎ | 92/2925 [01:18<39:58, 1.18it/s] 3%|▎ | 93/2925 [01:19<39:55, 1.18it/s] 3%|▎ | 94/2925 [01:20<39:54, 1.18it/s] 3%|▎ | 95/2925 [01:21<39:42, 1.19it/s] 3%|▎ | 96/2925 [01:22<40:04, 1.18it/s] 3%|▎ | 97/2925 [01:23<39:56, 1.18it/s] 3%|▎ | 98/2925 [01:23<39:44, 1.19it/s] 3%|▎ | 99/2925 [01:24<42:01, 1.12it/s] 3%|▎ | 100/2925 [01:25<42:41, 1.10it/s] 3%|▎ | 101/2925 [01:26<41:59, 1.12it/s] 3%|▎ | 102/2925 [01:27<41:26, 1.14it/s] 4%|▎ | 103/2925 [01:28<40:59, 1.15it/s] 4%|▎ | 104/2925 [01:29<40:35, 1.16it/s] 4%|▎ | 105/2925 [01:30<40:09, 1.17it/s] 4%|▎ | 106/2925 [01:31<40:59, 1.15it/s] 4%|▎ | 107/2925 [01:31<40:57, 1.15it/s] 4%|▎ | 108/2925 [01:32<40:33, 1.16it/s] 4%|▎ | 109/2925 [01:33<40:23, 1.16it/s] 4%|▍ | 110/2925 [01:34<40:00, 1.17it/s] 4%|▍ | 111/2925 [01:35<39:45, 1.18it/s] 4%|▍ | 112/2925 [01:36<39:52, 1.18it/s] 4%|▍ | 113/2925 [01:36<39:40, 1.18it/s] 4%|▍ | 114/2925 [01:37<39:32, 1.18it/s] 4%|▍ | 115/2925 [01:38<39:24, 1.19it/s] 4%|▍ | 116/2925 [01:39<39:23, 1.19it/s] 4%|▍ | 117/2925 [01:40<39:17, 1.19it/s] 4%|▍ | 118/2925 [01:41<39:18, 1.19it/s] 4%|▍ | 119/2925 [01:42<39:29, 1.18it/s] 4%|▍ | 120/2925 [01:42<39:45, 1.18it/s] 4%|▍ | 121/2925 [01:43<39:33, 1.18it/s] 4%|▍ | 122/2925 [01:44<39:25, 1.18it/s] 4%|▍ | 123/2925 [01:45<39:22, 1.19it/s] 4%|▍ | 124/2925 [01:46<39:34, 1.18it/s] 4%|▍ | 125/2925 [01:47<39:24, 1.18it/s] 4%|▍ | 126/2925 [01:47<39:15, 1.19it/s] 4%|▍ | 127/2925 [01:48<39:11, 1.19it/s] 4%|▍ | 128/2925 [01:49<39:07, 1.19it/s] 4%|▍ | 129/2925 [01:50<39:01, 1.19it/s] 4%|▍ | 130/2925 [01:51<39:13, 1.19it/s] 4%|▍ | 131/2925 [01:52<39:10, 1.19it/s] 5%|▍ | 132/2925 [01:52<39:07, 1.19it/s] 5%|▍ | 133/2925 [01:53<39:08, 1.19it/s] 5%|▍ | 134/2925 [01:54<39:22, 1.18it/s] 5%|▍ | 135/2925 [01:55<39:30, 1.18it/s] 5%|▍ | 136/2925 [01:56<39:19, 1.18it/s] 5%|▍ | 137/2925 [01:57<39:11, 1.19it/s] 5%|▍ | 138/2925 [01:58<39:05, 1.19it/s] 5%|▍ | 139/2925 [01:58<39:16, 1.18it/s] 5%|▍ | 140/2925 [01:59<39:18, 1.18it/s] 5%|▍ | 141/2925 [02:00<39:52, 1.16it/s] 5%|▍ | 142/2925 [02:01<39:35, 1.17it/s] 5%|▍ | 143/2925 [02:02<39:19, 1.18it/s] 5%|▍ | 144/2925 [02:03<39:11, 1.18it/s] 5%|▍ | 145/2925 [02:04<39:38, 1.17it/s] 5%|▍ | 146/2925 [02:04<39:24, 1.18it/s] 5%|▌ | 147/2925 [02:05<39:11, 1.18it/s] 5%|▌ | 148/2925 [02:06<38:58, 1.19it/s] 5%|▌ | 149/2925 [02:07<38:53, 1.19it/s] 5%|▌ | 150/2925 [02:08<38:55, 1.19it/s] 5%|▌ | 151/2925 [02:09<38:46, 1.19it/s] 5%|▌ | 152/2925 [02:09<39:10, 1.18it/s] 5%|▌ | 153/2925 [02:10<38:59, 1.18it/s] 5%|▌ | 154/2925 [02:11<38:50, 1.19it/s] 5%|▌ | 155/2925 [02:12<39:08, 1.18it/s] 5%|▌ | 156/2925 [02:13<39:06, 1.18it/s] 5%|▌ | 157/2925 [02:14<39:05, 1.18it/s] 5%|▌ | 158/2925 [02:14<38:53, 1.19it/s] 5%|▌ | 159/2925 [02:15<38:50, 1.19it/s] 5%|▌ | 160/2925 [02:16<38:50, 1.19it/s] 6%|▌ | 161/2925 [02:17<38:48, 1.19it/s] 6%|▌ | 162/2925 [02:18<38:59, 1.18it/s] 6%|▌ | 163/2925 [02:19<39:03, 1.18it/s] 6%|▌ | 164/2925 [02:20<38:58, 1.18it/s] 6%|▌ | 165/2925 [02:20<38:48, 1.19it/s] 6%|▌ | 166/2925 [02:21<39:00, 1.18it/s] 6%|▌ | 167/2925 [02:22<38:49, 1.18it/s] 6%|▌ | 168/2925 [02:23<38:46, 1.18it/s] 6%|▌ | 169/2925 [02:24<38:42, 1.19it/s] 6%|▌ | 170/2925 [02:25<38:33, 1.19it/s] 6%|▌ | 171/2925 [02:25<38:25, 1.19it/s] 6%|▌ | 172/2925 [02:26<38:25, 1.19it/s] 6%|▌ | 173/2925 [02:27<38:27, 1.19it/s] 6%|▌ | 174/2925 [02:28<38:49, 1.18it/s] 6%|▌ | 175/2925 [02:29<38:46, 1.18it/s] 6%|▌ | 176/2925 [02:30<38:39, 1.19it/s] 6%|▌ | 177/2925 [02:31<38:36, 1.19it/s] 6%|▌ | 178/2925 [02:31<38:28, 1.19it/s] 6%|▌ | 179/2925 [02:32<38:29, 1.19it/s] 6%|▌ | 180/2925 [02:33<38:22, 1.19it/s] 6%|▌ | 181/2925 [02:34<39:10, 1.17it/s] 6%|▌ | 182/2925 [02:35<39:10, 1.17it/s] 6%|▋ | 183/2925 [02:36<39:02, 1.17it/s] 6%|▋ | 184/2925 [02:37<39:35, 1.15it/s] 6%|▋ | 185/2925 [02:37<39:11, 1.17it/s] 6%|▋ | 186/2925 [02:38<39:03, 1.17it/s] 6%|▋ | 187/2925 [02:39<38:48, 1.18it/s] 6%|▋ | 188/2925 [02:40<38:59, 1.17it/s] 6%|▋ | 189/2925 [02:41<38:55, 1.17it/s] 6%|▋ | 190/2925 [02:42<38:42, 1.18it/s] 7%|▋ | 191/2925 [02:42<38:38, 1.18it/s] 7%|▋ | 192/2925 [02:43<38:35, 1.18it/s] 7%|▋ | 193/2925 [02:44<38:26, 1.18it/s] 7%|▋ | 194/2925 [02:45<38:20, 1.19it/s] 7%|▋ | 195/2925 [02:46<38:20, 1.19it/s] 7%|▋ | 196/2925 [02:47<38:17, 1.19it/s] 7%|▋ | 197/2925 [02:47<38:16, 1.19it/s] 7%|▋ | 198/2925 [02:48<38:28, 1.18it/s] 7%|▋ | 199/2925 [02:49<38:44, 1.17it/s] 7%|▋ | 200/2925 [02:50<38:37, 1.18it/s] {'loss': 1.0325, 'learning_rate': 4.942542412504543e-05, 'epoch': 0.07} 7%|▋ | 200/2925 [02:50<38:37, 1.18it/s] 0%| | 0/2601 [00:00