07/12/2022 23:33:48 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False 07/12/2022 23:33:48 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=IntervalStrategy.NO, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=3e-05, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=../results/phrase_retrieval/PR-pass/qa/whaleloops/phrase-bert/finetuned/runs/Jul12_23-33-44_gpu5, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=2.0, optim=OptimizerNames.ADAMW_HF, output_dir=../results/phrase_retrieval/PR-pass/qa/whaleloops/phrase-bert/finetuned, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=../results/phrase_retrieval/PR-pass/qa/whaleloops/phrase-bert/finetuned, save_on_each_node=False, save_steps=100000, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) 07/12/2022 23:33:56 - INFO - datasets.builder - Overwrite dataset info from restored data version. 07/12/2022 23:33:56 - INFO - datasets.info - Loading Dataset info from /home/thang/.cache/huggingface/datasets/PiC___phrase_retrieval/PR-pass/1.0.0/df969d90a784d9e54828c7c7d2ce5ad117c6a955ed833539b969e1c00e1d41f4 07/12/2022 23:33:56 - WARNING - datasets.builder - Reusing dataset phrase_retrieval (/home/thang/.cache/huggingface/datasets/PiC___phrase_retrieval/PR-pass/1.0.0/df969d90a784d9e54828c7c7d2ce5ad117c6a955ed833539b969e1c00e1d41f4) 07/12/2022 23:33:56 - INFO - datasets.info - Loading Dataset info from /home/thang/.cache/huggingface/datasets/PiC___phrase_retrieval/PR-pass/1.0.0/df969d90a784d9e54828c7c7d2ce5ad117c6a955ed833539b969e1c00e1d41f4 0%| | 0/3 [00:00> loading configuration file https://huggingface.co/whaleloops/phrase-bert/resolve/main/config.json from cache at /home/thang/.cache/huggingface/transformers/62cfb51a093ad89e817a23b38170cd7e448af4d81389373dfbc2071e3edfb769.2d3e2aee7a39d8283b1bf9892ebad74482e62bcf897413b4a246c5c312e59666 [INFO|configuration_utils.py:708] 2022-07-12 23:33:59,162 >> Model config BertConfig { "_name_or_path": "whaleloops/phrase-bert", "architectures": [ "BertModel" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.20.1", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 } [INFO|tokenization_utils_base.py:1781] 2022-07-12 23:34:05,413 >> loading file https://huggingface.co/whaleloops/phrase-bert/resolve/main/vocab.txt from cache at /home/thang/.cache/huggingface/transformers/31850d8b282f8512ee92b6a420af4c958ae48ddfb2faf24b049bfff73c015a76.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|tokenization_utils_base.py:1781] 2022-07-12 23:34:05,414 >> loading file https://huggingface.co/whaleloops/phrase-bert/resolve/main/tokenizer.json from cache at /home/thang/.cache/huggingface/transformers/b578ae2e104171ec2511751fa00552466dc7c909b7e62c61f945fcebd175a381.d2b4c50f542e11b76f117bcbb7ea83eaa1a63f2bc645fe95913ba1101c7e0cf6 [INFO|tokenization_utils_base.py:1781] 2022-07-12 23:34:05,414 >> loading file https://huggingface.co/whaleloops/phrase-bert/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1781] 2022-07-12 23:34:05,414 >> loading file https://huggingface.co/whaleloops/phrase-bert/resolve/main/special_tokens_map.json from cache at /home/thang/.cache/huggingface/transformers/892f54e8e43352eb5492f8c87717ecf41dc57604a7e8401968e1df056dde72e1.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d [INFO|tokenization_utils_base.py:1781] 2022-07-12 23:34:05,414 >> loading file https://huggingface.co/whaleloops/phrase-bert/resolve/main/tokenizer_config.json from cache at /home/thang/.cache/huggingface/transformers/b45851b086a84c19fc71957f11f22804c30d590005e6f5461c29b284cd98290d.84411b762161d243125cbc2aa86025bca9ac24bf1dc12f00c1587a5f069e8b4f [INFO|modeling_utils.py:2107] 2022-07-12 23:34:05,570 >> loading weights file https://huggingface.co/whaleloops/phrase-bert/resolve/main/pytorch_model.bin from cache at /home/thang/.cache/huggingface/transformers/5fc8f3446d4735c324f981040adebd6b7bbfdc72047edaa4fb75fa7979c58f46.dd8b3f2eba57449f29e2d2aa405e4ac12462714e164084e969e847516b09e65c [INFO|modeling_utils.py:2483] 2022-07-12 23:34:06,723 >> All model checkpoint weights were used when initializing BertForQuestionAnswering. [WARNING|modeling_utils.py:2485] 2022-07-12 23:34:06,723 >> Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at whaleloops/phrase-bert and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 07/12/2022 23:34:06 - WARNING - datasets.fingerprint - Parameter 'function'=.prepare_train_features at 0x7f9d1d2715e0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. Running tokenizer on train dataset: 0%| | 0/21 [00:00.prepare_validation_features at 0x7f9d1cf363a0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Running tokenizer on validation dataset: 0%| | 0/3 [00:00> ***** Running training ***** [INFO|trainer.py:1517] 2022-07-12 23:34:39,311 >> Num examples = 20261 [INFO|trainer.py:1518] 2022-07-12 23:34:39,311 >> Num Epochs = 2 [INFO|trainer.py:1519] 2022-07-12 23:34:39,311 >> Instantaneous batch size per device = 8 [INFO|trainer.py:1520] 2022-07-12 23:34:39,311 >> Total train batch size (w. parallel, distributed & accumulation) = 8 [INFO|trainer.py:1521] 2022-07-12 23:34:39,311 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1522] 2022-07-12 23:34:39,311 >> Total optimization steps = 5066 0%| | 0/5066 [00:00> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 460.898, 'train_samples_per_second': 87.92, 'train_steps_per_second': 10.992, 'train_loss': 0.42857344546809384, 'epoch': 2.0} 100%|██████████| 5066/5066 [07:40<00:00, 11.45it/s] 100%|██████████| 5066/5066 [07:40<00:00, 10.99it/s] [INFO|trainer.py:2503] 2022-07-12 23:42:20,210 >> Saving model checkpoint to ../results/phrase_retrieval/PR-pass/qa/whaleloops/phrase-bert/finetuned [INFO|configuration_utils.py:446] 2022-07-12 23:42:20,211 >> Configuration saved in ../results/phrase_retrieval/PR-pass/qa/whaleloops/phrase-bert/finetuned/config.json [INFO|modeling_utils.py:1660] 2022-07-12 23:42:20,747 >> Model weights saved in ../results/phrase_retrieval/PR-pass/qa/whaleloops/phrase-bert/finetuned/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2022-07-12 23:42:20,747 >> tokenizer config file saved in ../results/phrase_retrieval/PR-pass/qa/whaleloops/phrase-bert/finetuned/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2022-07-12 23:42:20,747 >> Special tokens file saved in ../results/phrase_retrieval/PR-pass/qa/whaleloops/phrase-bert/finetuned/special_tokens_map.json ***** train metrics ***** epoch = 2.0 train_loss = 0.4286 train_runtime = 0:07:40.89 train_samples = 20261 train_samples_per_second = 87.92 train_steps_per_second = 10.992 07/12/2022 23:42:20 - INFO - __main__ - *** Evaluate *** [INFO|trainer.py:661] 2022-07-12 23:42:20,779 >> The following columns in the evaluation set don't have a corresponding argument in `BertForQuestionAnswering.forward` and have been ignored: offset_mapping, example_id. If offset_mapping, example_id are not expected by `BertForQuestionAnswering.forward`, you can safely ignore this message. [INFO|trainer.py:2753] 2022-07-12 23:42:20,781 >> ***** Running Evaluation ***** [INFO|trainer.py:2755] 2022-07-12 23:42:20,782 >> Num examples = 3013 [INFO|trainer.py:2758] 2022-07-12 23:42:20,782 >> Batch size = 8 0%| | 0/377 [00:00> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Question Answering', 'type': 'question-answering'}, 'dataset': {'name': 'PiC/phrase_retrieval PR-pass', 'type': 'PiC/phrase_retrieval', 'args': 'PR-pass'}}