07/22/2022 12:31:56 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: True
07/22/2022 12:31:56 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=IntervalStrategy.NO,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=runs/ebmnlp_hf/BioLinkBERT-base/runs/Jul22_12-31-56_spartan-gpgpu080.hpc.unimelb.edu.au,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=OptimizerNames.ADAMW_HF,
output_dir=runs/ebmnlp_hf/BioLinkBERT-base,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=32,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=runs/ebmnlp_hf/BioLinkBERT-base,
save_on_each_node=False,
save_steps=500,
save_strategy=IntervalStrategy.NO,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
07/22/2022 12:31:57 - WARNING - datasets.builder - Using custom data configuration default-2d9cec4b8a27d237
07/22/2022 12:31:57 - INFO - datasets.builder - Overwrite dataset info from restored data version.
07/22/2022 12:31:57 - INFO - datasets.info - Loading Dataset info from /home/hungthinht/.cache/huggingface/datasets/json/default-2d9cec4b8a27d237/0.0.0/da492aad5680612e4028e7f6ddc04b1dfcec4b64db470ed7cc5f2bb265b9b6b5
07/22/2022 12:31:57 - WARNING - datasets.builder - Reusing dataset json (/home/hungthinht/.cache/huggingface/datasets/json/default-2d9cec4b8a27d237/0.0.0/da492aad5680612e4028e7f6ddc04b1dfcec4b64db470ed7cc5f2bb265b9b6b5)
07/22/2022 12:31:57 - INFO - datasets.info - Loading Dataset info from /home/hungthinht/.cache/huggingface/datasets/json/default-2d9cec4b8a27d237/0.0.0/da492aad5680612e4028e7f6ddc04b1dfcec4b64db470ed7cc5f2bb265b9b6b5
  0%|          | 0/3 [00:00<?, ?it/s]100%|██████████| 3/3 [00:00<00:00, 491.92it/s]
[INFO|configuration_utils.py:659] 2022-07-22 12:31:59,048 >> loading configuration file https://huggingface.co/michiyasunaga/BioLinkBERT-base/resolve/main/config.json from cache at /home/hungthinht/.cache/huggingface/transformers/ad032c76cac1f75bba037ba006dcccc1c62ab157749b194df023bfa55e5f4fbf.22ae3f7c73ebda8488a8505a67c1b929a707ae7db67a129f60b7c28acfc38436
[INFO|configuration_utils.py:708] 2022-07-22 12:31:59,083 >> Model config BertConfig {
  "_name_or_path": "michiyasunaga/BioLinkBERT-base",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "finetuning_task": "ner",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "B-INT",
    "1": "B-OUT",
    "2": "B-PAR",
    "3": "O"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "B-INT": 0,
    "B-OUT": 1,
    "B-PAR": 2,
    "O": 3
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.20.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28895
}

[INFO|tokenization_utils_base.py:1781] 2022-07-22 12:32:05,294 >> loading file https://huggingface.co/michiyasunaga/BioLinkBERT-base/resolve/main/vocab.txt from cache at /home/hungthinht/.cache/huggingface/transformers/9eb712b5fcba51331b49cb69f18de1577371a2582055a298e2546c0c97d3b924.73b5c069d3e40205dd2df2379051c9f47d13c3bad0bcb3cee659c69e3a185a86
[INFO|tokenization_utils_base.py:1781] 2022-07-22 12:32:05,294 >> loading file https://huggingface.co/michiyasunaga/BioLinkBERT-base/resolve/main/tokenizer.json from cache at /home/hungthinht/.cache/huggingface/transformers/3c720cf86b025f815b1d833b6b39db05e8e7493b6f6a87788c485a946848b4d8.a25e24b89fd9bfd32e3c8d2dbb39879c62152e7f069ab24c97198c004cad94c9
[INFO|tokenization_utils_base.py:1781] 2022-07-22 12:32:05,294 >> loading file https://huggingface.co/michiyasunaga/BioLinkBERT-base/resolve/main/added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1781] 2022-07-22 12:32:05,294 >> loading file https://huggingface.co/michiyasunaga/BioLinkBERT-base/resolve/main/special_tokens_map.json from cache at /home/hungthinht/.cache/huggingface/transformers/0598867425495ec6baf3617ab3789f3d8b84ebf869f7b43aa4a2930195a74dbe.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d
[INFO|tokenization_utils_base.py:1781] 2022-07-22 12:32:05,294 >> loading file https://huggingface.co/michiyasunaga/BioLinkBERT-base/resolve/main/tokenizer_config.json from cache at /home/hungthinht/.cache/huggingface/transformers/30e2841862fd496cf36bc8647c9633a1dc319fbf6cc88a80438ca3f89e28339b.fab032bd2aab224bad4dcfc35e3bd6122976da1fa23e4feeb97d8fa65491aded
[INFO|modeling_utils.py:2107] 2022-07-22 12:32:06,276 >> loading weights file https://huggingface.co/michiyasunaga/BioLinkBERT-base/resolve/main/pytorch_model.bin from cache at /home/hungthinht/.cache/huggingface/transformers/76a88449a3eb7019bbc0d164cc39a6a231c8bbe3b9678b8d40977424f0ad934d.f8b95ad9e1dea734685fba5a5b6142b539678b7fc2311981cc14ae61b19f709d
[INFO|modeling_utils.py:2483] 2022-07-22 12:32:07,350 >> All model checkpoint weights were used when initializing BertForTokenClassification.

[WARNING|modeling_utils.py:2485] 2022-07-22 12:32:07,350 >> Some weights of BertForTokenClassification were not initialized from the model checkpoint at michiyasunaga/BioLinkBERT-base and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
07/22/2022 12:32:07 - WARNING - datasets.fingerprint - Parameter 'function'=<function main.<locals>.tokenize_and_align_labels at 0x2ac6e9964940> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
07/22/2022 12:32:07 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/hungthinht/.cache/huggingface/datasets/json/default-2d9cec4b8a27d237/0.0.0/da492aad5680612e4028e7f6ddc04b1dfcec4b64db470ed7cc5f2bb265b9b6b5/cache-1c80317fa3b1799d.arrow
07/22/2022 12:32:07 - INFO - datasets.fingerprint - Parameter 'function'=<function main.<locals>.tokenize_and_align_labels at 0x2ac6e99b3d30> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead.
07/22/2022 12:32:07 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/hungthinht/.cache/huggingface/datasets/json/default-2d9cec4b8a27d237/0.0.0/da492aad5680612e4028e7f6ddc04b1dfcec4b64db470ed7cc5f2bb265b9b6b5/cache-bdd640fb06671ad1.arrow
07/22/2022 12:32:07 - INFO - datasets.fingerprint - Parameter 'function'=<function main.<locals>.tokenize_and_align_labels at 0x2ac6e9964940> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead.
07/22/2022 12:32:07 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/hungthinht/.cache/huggingface/datasets/json/default-2d9cec4b8a27d237/0.0.0/da492aad5680612e4028e7f6ddc04b1dfcec4b64db470ed7cc5f2bb265b9b6b5/cache-3eb13b9046685257.arrow
[INFO|trainer.py:533] 2022-07-22 12:32:09,812 >> Using cuda_amp half precision backend
[INFO|trainer.py:661] 2022-07-22 12:32:09,812 >> The following columns in the training set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: id, ner_tags, word_ids, tokens. If id, ner_tags, word_ids, tokens are not expected by `BertForTokenClassification.forward`,  you can safely ignore this message.
/home/hungthinht/miniconda3/lib/python3.9/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[INFO|trainer.py:1516] 2022-07-22 12:32:09,838 >> ***** Running training *****
[INFO|trainer.py:1517] 2022-07-22 12:32:09,838 >>   Num examples = 40935
[INFO|trainer.py:1518] 2022-07-22 12:32:09,838 >>   Num Epochs = 1
[INFO|trainer.py:1519] 2022-07-22 12:32:09,838 >>   Instantaneous batch size per device = 32
[INFO|trainer.py:1520] 2022-07-22 12:32:09,838 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1521] 2022-07-22 12:32:09,838 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1522] 2022-07-22 12:32:09,838 >>   Total optimization steps = 1280
  0%|          | 0/1280 [00:00<?, ?it/s]  0%|          | 1/1280 [00:00<13:42,  1.56it/s]  0%|          | 3/1280 [00:00<04:50,  4.39it/s]  0%|          | 5/1280 [00:00<03:11,  6.65it/s]  1%|          | 7/1280 [00:01<02:31,  8.43it/s]  1%|          | 9/1280 [00:01<02:06, 10.01it/s]  1%|          | 11/1280 [00:01<01:59, 10.62it/s]  1%|          | 13/1280 [00:01<01:49, 11.60it/s]  1%|          | 15/1280 [00:01<01:42, 12.29it/s]  1%|▏         | 17/1280 [00:01<01:37, 12.91it/s]  1%|▏         | 19/1280 [00:01<01:33, 13.42it/s]  2%|▏         | 21/1280 [00:02<01:34, 13.33it/s]  2%|▏         | 23/1280 [00:02<01:34, 13.28it/s]  2%|▏         | 25/1280 [00:02<01:35, 13.19it/s]  2%|▏         | 27/1280 [00:02<01:32, 13.51it/s]  2%|▏         | 29/1280 [00:02<01:33, 13.35it/s]  2%|▏         | 31/1280 [00:02<01:37, 12.82it/s]  3%|▎         | 33/1280 [00:03<01:36, 12.92it/s]  3%|▎         | 35/1280 [00:03<01:34, 13.13it/s]  3%|▎         | 37/1280 [00:03<01:40, 12.40it/s]  3%|▎         | 39/1280 [00:03<01:35, 13.03it/s]  3%|▎         | 41/1280 [00:03<01:35, 12.98it/s]  3%|▎         | 43/1280 [00:03<01:33, 13.28it/s]  4%|▎         | 45/1280 [00:03<01:33, 13.23it/s]  4%|▎         | 47/1280 [00:04<01:29, 13.78it/s]  4%|▍         | 49/1280 [00:04<01:26, 14.19it/s]  4%|▍         | 51/1280 [00:04<01:31, 13.42it/s]  4%|▍         | 53/1280 [00:04<01:30, 13.51it/s]  4%|▍         | 55/1280 [00:04<01:34, 12.92it/s]  4%|▍         | 57/1280 [00:04<01:35, 12.77it/s]  5%|▍         | 59/1280 [00:05<01:34, 12.95it/s]  5%|▍         | 61/1280 [00:05<01:31, 13.35it/s]  5%|▍         | 63/1280 [00:05<01:30, 13.39it/s]  5%|▌         | 65/1280 [00:05<01:28, 13.77it/s]  5%|▌         | 67/1280 [00:05<01:28, 13.66it/s]  5%|▌         | 69/1280 [00:05<01:28, 13.61it/s]  6%|▌         | 71/1280 [00:05<01:26, 14.00it/s]  6%|▌         | 73/1280 [00:06<01:34, 12.73it/s]  6%|▌         | 75/1280 [00:06<01:32, 12.96it/s]  6%|▌         | 77/1280 [00:06<01:28, 13.52it/s]  6%|▌         | 79/1280 [00:06<01:27, 13.80it/s]  6%|▋         | 81/1280 [00:06<01:26, 13.85it/s]  6%|▋         | 83/1280 [00:06<01:25, 13.96it/s]  7%|▋         | 85/1280 [00:06<01:23, 14.27it/s]  7%|▋         | 87/1280 [00:07<01:23, 14.30it/s]  7%|▋         | 89/1280 [00:07<01:24, 14.14it/s]  7%|▋         | 91/1280 [00:07<01:23, 14.22it/s]  7%|▋         | 93/1280 [00:07<01:22, 14.45it/s]  7%|▋         | 95/1280 [00:07<01:24, 14.05it/s]  8%|▊         | 97/1280 [00:07<01:26, 13.74it/s]  8%|▊         | 99/1280 [00:07<01:28, 13.27it/s]  8%|▊         | 101/1280 [00:08<01:29, 13.11it/s]  8%|▊         | 103/1280 [00:08<01:29, 13.17it/s]  8%|▊         | 105/1280 [00:08<01:28, 13.34it/s]  8%|▊         | 107/1280 [00:08<01:24, 13.82it/s]  9%|▊         | 109/1280 [00:08<01:28, 13.30it/s]  9%|▊         | 111/1280 [00:08<01:25, 13.65it/s]  9%|▉         | 113/1280 [00:08<01:23, 14.05it/s]  9%|▉         | 115/1280 [00:09<01:32, 12.63it/s]  9%|▉         | 117/1280 [00:09<01:27, 13.26it/s]  9%|▉         | 119/1280 [00:09<01:24, 13.80it/s]  9%|▉         | 121/1280 [00:09<01:23, 13.93it/s] 10%|▉         | 123/1280 [00:09<01:25, 13.57it/s] 10%|▉         | 125/1280 [00:09<01:25, 13.44it/s] 10%|▉         | 127/1280 [00:09<01:24, 13.63it/s] 10%|█         | 129/1280 [00:10<01:23, 13.71it/s] 10%|█         | 131/1280 [00:10<01:21, 14.15it/s] 10%|█         | 133/1280 [00:10<01:22, 13.91it/s] 11%|█         | 135/1280 [00:10<01:22, 13.90it/s] 11%|█         | 137/1280 [00:10<01:23, 13.75it/s] 11%|█         | 139/1280 [00:10<01:21, 13.96it/s] 11%|█         | 141/1280 [00:10<01:19, 14.30it/s] 11%|█         | 143/1280 [00:11<01:18, 14.40it/s] 11%|█▏        | 145/1280 [00:11<01:20, 14.08it/s] 11%|█▏        | 147/1280 [00:11<01:20, 14.06it/s] 12%|█▏        | 149/1280 [00:11<01:18, 14.34it/s] 12%|█▏        | 151/1280 [00:11<01:18, 14.34it/s] 12%|█▏        | 153/1280 [00:11<01:22, 13.70it/s] 12%|█▏        | 155/1280 [00:11<01:22, 13.71it/s] 12%|█▏        | 157/1280 [00:12<01:20, 14.03it/s] 12%|█▏        | 159/1280 [00:12<01:20, 13.98it/s] 13%|█▎        | 161/1280 [00:12<01:23, 13.40it/s] 13%|█▎        | 163/1280 [00:12<01:24, 13.14it/s] 13%|█▎        | 165/1280 [00:12<01:29, 12.51it/s] 13%|█▎        | 167/1280 [00:12<01:24, 13.16it/s] 13%|█▎        | 169/1280 [00:13<01:24, 13.13it/s] 13%|█▎        | 171/1280 [00:13<01:20, 13.70it/s] 14%|█▎        | 173/1280 [00:13<01:17, 14.25it/s] 14%|█▎        | 175/1280 [00:13<01:17, 14.27it/s] 14%|█▍        | 177/1280 [00:13<01:23, 13.16it/s] 14%|█▍        | 179/1280 [00:13<01:20, 13.74it/s] 14%|█▍        | 181/1280 [00:13<01:22, 13.29it/s] 14%|█▍        | 183/1280 [00:14<01:19, 13.74it/s] 14%|█▍        | 185/1280 [00:14<01:17, 14.13it/s] 15%|█▍        | 187/1280 [00:14<01:14, 14.60it/s] 15%|█▍        | 189/1280 [00:14<01:20, 13.53it/s] 15%|█▍        | 191/1280 [00:14<01:20, 13.60it/s] 15%|█▌        | 193/1280 [00:14<01:18, 13.93it/s] 15%|█▌        | 195/1280 [00:14<01:17, 14.06it/s] 15%|█▌        | 197/1280 [00:15<01:16, 14.21it/s] 16%|█▌        | 199/1280 [00:15<01:20, 13.44it/s] 16%|█▌        | 201/1280 [00:15<01:18, 13.80it/s] 16%|█▌        | 203/1280 [00:15<01:16, 14.17it/s] 16%|█▌        | 205/1280 [00:15<01:16, 14.09it/s] 16%|█▌        | 207/1280 [00:15<01:15, 14.28it/s] 16%|█▋        | 209/1280 [00:15<01:17, 13.82it/s] 16%|█▋        | 211/1280 [00:16<01:18, 13.62it/s] 17%|█▋        | 213/1280 [00:16<01:26, 12.31it/s] 17%|█▋        | 215/1280 [00:16<01:26, 12.25it/s] 17%|█▋        | 217/1280 [00:16<01:23, 12.66it/s] 17%|█▋        | 219/1280 [00:16<01:21, 13.10it/s] 17%|█▋        | 221/1280 [00:16<01:18, 13.57it/s] 17%|█▋        | 223/1280 [00:16<01:15, 13.92it/s] 18%|█▊        | 225/1280 [00:17<01:16, 13.76it/s] 18%|█▊        | 227/1280 [00:17<01:17, 13.56it/s] 18%|█▊        | 229/1280 [00:17<01:16, 13.79it/s] 18%|█▊        | 231/1280 [00:17<01:14, 14.05it/s] 18%|█▊        | 233/1280 [00:17<01:12, 14.36it/s] 18%|█▊        | 235/1280 [00:17<01:11, 14.62it/s] 19%|█▊        | 237/1280 [00:17<01:10, 14.76it/s] 19%|█▊        | 239/1280 [00:18<01:11, 14.54it/s] 19%|█▉        | 241/1280 [00:18<01:17, 13.43it/s] 19%|█▉        | 243/1280 [00:18<01:14, 13.94it/s] 19%|█▉        | 245/1280 [00:18<01:12, 14.21it/s] 19%|█▉        | 247/1280 [00:18<01:11, 14.43it/s] 19%|█▉        | 249/1280 [00:18<01:17, 13.35it/s] 20%|█▉        | 251/1280 [00:18<01:15, 13.55it/s] 20%|█▉        | 253/1280 [00:19<01:13, 13.88it/s] 20%|█▉        | 255/1280 [00:19<01:15, 13.57it/s] 20%|██        | 257/1280 [00:19<01:12, 14.04it/s] 20%|██        | 259/1280 [00:19<01:13, 13.85it/s] 20%|██        | 261/1280 [00:19<01:19, 12.82it/s] 21%|██        | 263/1280 [00:19<01:20, 12.68it/s] 21%|██        | 265/1280 [00:20<01:23, 12.09it/s] 21%|██        | 267/1280 [00:20<01:19, 12.68it/s] 21%|██        | 269/1280 [00:20<01:17, 13.06it/s] 21%|██        | 271/1280 [00:20<01:16, 13.27it/s] 21%|██▏       | 273/1280 [00:20<01:14, 13.56it/s] 21%|██▏       | 275/1280 [00:20<01:16, 13.16it/s] 22%|██▏       | 277/1280 [00:20<01:16, 13.14it/s] 22%|██▏       | 279/1280 [00:21<01:16, 13.12it/s] 22%|██▏       | 281/1280 [00:21<01:22, 12.05it/s] 22%|██▏       | 283/1280 [00:21<01:20, 12.31it/s] 22%|██▏       | 285/1280 [00:21<01:19, 12.51it/s] 22%|██▏       | 287/1280 [00:21<01:16, 13.06it/s] 23%|██▎       | 289/1280 [00:21<01:14, 13.34it/s] 23%|██▎       | 291/1280 [00:22<01:12, 13.63it/s] 23%|██▎       | 293/1280 [00:22<01:11, 13.76it/s] 23%|██▎       | 295/1280 [00:22<01:11, 13.85it/s] 23%|██▎       | 297/1280 [00:22<01:18, 12.51it/s] 23%|██▎       | 299/1280 [00:22<01:26, 11.31it/s] 24%|██▎       | 301/1280 [00:22<01:27, 11.14it/s] 24%|██▎       | 303/1280 [00:23<01:26, 11.35it/s] 24%|██▍       | 305/1280 [00:23<01:19, 12.34it/s] 24%|██▍       | 307/1280 [00:23<01:19, 12.30it/s] 24%|██▍       | 309/1280 [00:23<01:15, 12.94it/s] 24%|██▍       | 311/1280 [00:23<01:10, 13.69it/s] 24%|██▍       | 313/1280 [00:23<01:12, 13.35it/s] 25%|██▍       | 315/1280 [00:23<01:12, 13.32it/s] 25%|██▍       | 317/1280 [00:24<01:14, 13.01it/s] 25%|██▍       | 319/1280 [00:24<01:10, 13.55it/s] 25%|██▌       | 321/1280 [00:24<01:08, 14.03it/s] 25%|██▌       | 323/1280 [00:24<01:06, 14.39it/s] 25%|██▌       | 325/1280 [00:24<01:07, 14.20it/s] 26%|██▌       | 327/1280 [00:24<01:06, 14.44it/s] 26%|██▌       | 329/1280 [00:24<01:09, 13.76it/s] 26%|██▌       | 331/1280 [00:25<01:11, 13.22it/s] 26%|██▌       | 333/1280 [00:25<01:13, 12.93it/s] 26%|██▌       | 335/1280 [00:25<01:11, 13.30it/s] 26%|██▋       | 337/1280 [00:25<01:09, 13.62it/s] 26%|██▋       | 339/1280 [00:25<01:07, 13.98it/s] 27%|██▋       | 341/1280 [00:25<01:05, 14.29it/s] 27%|██▋       | 343/1280 [00:25<01:05, 14.20it/s] 27%|██▋       | 345/1280 [00:26<01:05, 14.26it/s] 27%|██▋       | 347/1280 [00:26<01:04, 14.55it/s] 27%|██▋       | 349/1280 [00:26<01:02, 14.80it/s] 27%|██▋       | 351/1280 [00:26<01:03, 14.73it/s] 28%|██▊       | 353/1280 [00:26<01:09, 13.35it/s] 28%|██▊       | 355/1280 [00:26<01:08, 13.55it/s] 28%|██▊       | 357/1280 [00:27<01:09, 13.37it/s] 28%|██▊       | 359/1280 [00:27<01:07, 13.65it/s] 28%|██▊       | 361/1280 [00:27<01:05, 14.12it/s] 28%|██▊       | 363/1280 [00:27<01:08, 13.32it/s] 29%|██▊       | 365/1280 [00:27<01:16, 12.02it/s] 29%|██▊       | 367/1280 [00:27<01:11, 12.68it/s] 29%|██▉       | 369/1280 [00:27<01:11, 12.78it/s] 29%|██▉       | 371/1280 [00:28<01:17, 11.75it/s] 29%|██▉       | 373/1280 [00:28<01:16, 11.91it/s] 29%|██▉       | 375/1280 [00:28<01:13, 12.33it/s] 29%|██▉       | 377/1280 [00:28<01:10, 12.79it/s] 30%|██▉       | 379/1280 [00:28<01:06, 13.54it/s] 30%|██▉       | 381/1280 [00:28<01:04, 13.93it/s] 30%|██▉       | 383/1280 [00:29<01:07, 13.26it/s] 30%|███       | 385/1280 [00:29<01:05, 13.67it/s] 30%|███       | 387/1280 [00:29<01:04, 13.75it/s] 30%|███       | 389/1280 [00:29<01:03, 14.12it/s] 31%|███       | 391/1280 [00:29<01:07, 13.17it/s] 31%|███       | 393/1280 [00:29<01:06, 13.34it/s] 31%|███       | 395/1280 [00:29<01:04, 13.81it/s] 31%|███       | 397/1280 [00:30<01:04, 13.63it/s] 31%|███       | 399/1280 [00:30<01:06, 13.24it/s] 31%|███▏      | 401/1280 [00:30<01:08, 12.91it/s] 31%|███▏      | 403/1280 [00:30<01:04, 13.59it/s] 32%|███▏      | 405/1280 [00:30<01:08, 12.82it/s] 32%|███▏      | 407/1280 [00:30<01:06, 13.16it/s] 32%|███▏      | 409/1280 [00:30<01:04, 13.42it/s] 32%|███▏      | 411/1280 [00:31<01:04, 13.50it/s] 32%|███▏      | 413/1280 [00:31<01:02, 13.85it/s] 32%|███▏      | 415/1280 [00:31<01:06, 13.03it/s] 33%|███▎      | 417/1280 [00:31<01:03, 13.49it/s] 33%|███▎      | 419/1280 [00:31<01:04, 13.29it/s] 33%|███▎      | 421/1280 [00:31<01:02, 13.84it/s] 33%|███▎      | 423/1280 [00:31<01:02, 13.62it/s] 33%|███▎      | 425/1280 [00:32<01:04, 13.20it/s] 33%|███▎      | 427/1280 [00:32<01:07, 12.59it/s] 34%|███▎      | 429/1280 [00:32<01:06, 12.88it/s] 34%|███▎      | 431/1280 [00:32<01:03, 13.38it/s] 34%|███▍      | 433/1280 [00:32<01:01, 13.77it/s] 34%|███▍      | 435/1280 [00:32<00:59, 14.27it/s] 34%|███▍      | 437/1280 [00:33<00:59, 14.20it/s] 34%|███▍      | 439/1280 [00:33<00:58, 14.27it/s] 34%|███▍      | 441/1280 [00:33<01:01, 13.53it/s] 35%|███▍      | 443/1280 [00:33<01:05, 12.86it/s] 35%|███▍      | 445/1280 [00:33<01:05, 12.72it/s] 35%|███▍      | 447/1280 [00:33<01:02, 13.22it/s] 35%|███▌      | 449/1280 [00:34<01:15, 11.03it/s] 35%|███▌      | 451/1280 [00:34<01:14, 11.17it/s] 35%|███▌      | 453/1280 [00:34<01:09, 11.91it/s] 36%|███▌      | 455/1280 [00:34<01:04, 12.80it/s] 36%|███▌      | 457/1280 [00:34<01:01, 13.31it/s] 36%|███▌      | 459/1280 [00:34<00:58, 13.92it/s] 36%|███▌      | 461/1280 [00:34<01:00, 13.61it/s] 36%|███▌      | 463/1280 [00:35<00:58, 14.07it/s] 36%|███▋      | 465/1280 [00:35<00:59, 13.59it/s] 36%|███▋      | 467/1280 [00:35<00:58, 13.91it/s] 37%|███▋      | 469/1280 [00:35<00:57, 14.06it/s] 37%|███▋      | 471/1280 [00:35<01:09, 11.62it/s] 37%|███▋      | 473/1280 [00:35<01:05, 12.30it/s] 37%|███▋      | 475/1280 [00:35<01:02, 12.96it/s] 37%|███▋      | 477/1280 [00:36<01:03, 12.58it/s] 37%|███▋      | 479/1280 [00:36<01:04, 12.32it/s] 38%|███▊      | 481/1280 [00:36<01:01, 12.94it/s] 38%|███▊      | 483/1280 [00:36<01:01, 12.91it/s] 38%|███▊      | 485/1280 [00:36<00:59, 13.40it/s] 38%|███▊      | 487/1280 [00:36<00:57, 13.83it/s] 38%|███▊      | 489/1280 [00:37<00:55, 14.33it/s] 38%|███▊      | 491/1280 [00:37<00:56, 13.88it/s] 39%|███▊      | 493/1280 [00:37<00:57, 13.67it/s] 39%|███▊      | 495/1280 [00:37<00:58, 13.33it/s] 39%|███▉      | 497/1280 [00:37<01:00, 13.00it/s] 39%|███▉      | 499/1280 [00:37<01:02, 12.58it/s]                                                  {'loss': 0.5034, 'learning_rate': 3.0546875e-05, 'epoch': 0.39}
 39%|███▉      | 500/1280 [00:37<01:02, 12.58it/s] 39%|███▉      | 501/1280 [00:37<00:59, 13.01it/s] 39%|███▉      | 503/1280 [00:38<00:59, 13.00it/s] 39%|███▉      | 505/1280 [00:38<00:57, 13.56it/s] 40%|███▉      | 507/1280 [00:38<00:56, 13.65it/s] 40%|███▉      | 509/1280 [00:38<00:55, 14.00it/s] 40%|███▉      | 511/1280 [00:38<00:54, 14.15it/s] 40%|████      | 513/1280 [00:38<00:52, 14.51it/s] 40%|████      | 515/1280 [00:38<00:51, 14.82it/s] 40%|████      | 517/1280 [00:39<00:51, 14.91it/s] 41%|████      | 519/1280 [00:39<00:51, 14.73it/s] 41%|████      | 521/1280 [00:39<00:50, 14.90it/s] 41%|████      | 523/1280 [00:39<00:51, 14.74it/s] 41%|████      | 525/1280 [00:39<00:51, 14.75it/s] 41%|████      | 527/1280 [00:39<00:51, 14.61it/s] 41%|████▏     | 529/1280 [00:39<00:53, 14.01it/s] 41%|████▏     | 531/1280 [00:40<00:53, 13.98it/s] 42%|████▏     | 533/1280 [00:40<00:55, 13.37it/s] 42%|████▏     | 535/1280 [00:40<00:53, 13.96it/s] 42%|████▏     | 537/1280 [00:40<00:51, 14.34it/s] 42%|████▏     | 539/1280 [00:40<00:50, 14.62it/s] 42%|████▏     | 541/1280 [00:40<00:51, 14.45it/s] 42%|████▏     | 543/1280 [00:40<00:51, 14.39it/s] 43%|████▎     | 545/1280 [00:41<00:51, 14.38it/s] 43%|████▎     | 547/1280 [00:41<00:53, 13.70it/s] 43%|████▎     | 549/1280 [00:41<00:52, 13.81it/s] 43%|████▎     | 551/1280 [00:41<00:51, 14.17it/s] 43%|████▎     | 553/1280 [00:41<00:49, 14.55it/s] 43%|████▎     | 555/1280 [00:41<00:50, 14.48it/s] 44%|████▎     | 557/1280 [00:41<00:49, 14.50it/s] 44%|████▎     | 559/1280 [00:42<00:52, 13.67it/s] 44%|████▍     | 561/1280 [00:42<00:54, 13.20it/s] 44%|████▍     | 563/1280 [00:42<00:54, 13.13it/s] 44%|████▍     | 565/1280 [00:42<00:52, 13.62it/s] 44%|████▍     | 567/1280 [00:42<00:55, 12.95it/s] 44%|████▍     | 569/1280 [00:42<00:53, 13.33it/s] 45%|████▍     | 571/1280 [00:42<00:51, 13.89it/s] 45%|████▍     | 573/1280 [00:43<00:52, 13.36it/s] 45%|████▍     | 575/1280 [00:43<00:54, 12.91it/s] 45%|████▌     | 577/1280 [00:43<00:51, 13.55it/s] 45%|████▌     | 579/1280 [00:43<00:49, 14.04it/s] 45%|████▌     | 581/1280 [00:43<00:51, 13.70it/s] 46%|████▌     | 583/1280 [00:43<00:52, 13.26it/s] 46%|████▌     | 585/1280 [00:43<00:53, 13.11it/s] 46%|████▌     | 587/1280 [00:44<00:51, 13.55it/s] 46%|████▌     | 589/1280 [00:44<00:50, 13.74it/s] 46%|████▌     | 591/1280 [00:44<00:49, 13.96it/s] 46%|████▋     | 593/1280 [00:44<00:48, 14.28it/s] 46%|████▋     | 595/1280 [00:44<00:49, 13.90it/s] 47%|████▋     | 597/1280 [00:44<00:47, 14.34it/s] 47%|████▋     | 599/1280 [00:44<00:47, 14.34it/s] 47%|████▋     | 601/1280 [00:45<00:46, 14.55it/s] 47%|████▋     | 603/1280 [00:45<00:46, 14.55it/s] 47%|████▋     | 605/1280 [00:45<00:45, 14.70it/s] 47%|████▋     | 607/1280 [00:45<00:48, 13.97it/s] 48%|████▊     | 609/1280 [00:45<00:47, 14.07it/s] 48%|████▊     | 611/1280 [00:45<00:50, 13.35it/s] 48%|████▊     | 613/1280 [00:45<00:50, 13.28it/s] 48%|████▊     | 615/1280 [00:46<00:48, 13.76it/s] 48%|████▊     | 617/1280 [00:46<00:49, 13.31it/s] 48%|████▊     | 619/1280 [00:46<00:52, 12.66it/s] 49%|████▊     | 621/1280 [00:46<00:53, 12.22it/s] 49%|████▊     | 623/1280 [00:46<00:54, 12.09it/s] 49%|████▉     | 625/1280 [00:46<00:51, 12.77it/s] 49%|████▉     | 627/1280 [00:47<00:48, 13.36it/s] 49%|████▉     | 629/1280 [00:47<00:47, 13.73it/s] 49%|████▉     | 631/1280 [00:47<00:47, 13.62it/s] 49%|████▉     | 633/1280 [00:47<00:45, 14.23it/s] 50%|████▉     | 635/1280 [00:47<00:47, 13.64it/s] 50%|████▉     | 637/1280 [00:47<00:46, 13.84it/s] 50%|████▉     | 639/1280 [00:47<00:47, 13.37it/s] 50%|█████     | 641/1280 [00:48<00:47, 13.57it/s] 50%|█████     | 643/1280 [00:48<00:45, 13.94it/s] 50%|█████     | 645/1280 [00:48<00:45, 13.93it/s] 51%|█████     | 647/1280 [00:48<00:45, 14.05it/s] 51%|█████     | 649/1280 [00:48<00:45, 13.75it/s] 51%|█████     | 651/1280 [00:48<00:45, 13.78it/s] 51%|█████     | 653/1280 [00:48<00:45, 13.75it/s] 51%|█████     | 655/1280 [00:49<00:48, 12.96it/s] 51%|█████▏    | 657/1280 [00:49<00:46, 13.39it/s] 51%|█████▏    | 659/1280 [00:49<00:59, 10.40it/s] 52%|█████▏    | 661/1280 [00:49<00:54, 11.31it/s] 52%|█████▏    | 663/1280 [00:49<00:50, 12.15it/s] 52%|█████▏    | 665/1280 [00:49<00:48, 12.66it/s] 52%|█████▏    | 667/1280 [00:50<00:47, 12.87it/s] 52%|█████▏    | 669/1280 [00:50<00:45, 13.51it/s] 52%|█████▏    | 671/1280 [00:50<00:45, 13.30it/s] 53%|█████▎    | 673/1280 [00:50<00:44, 13.66it/s] 53%|█████▎    | 675/1280 [00:50<00:42, 14.17it/s] 53%|█████▎    | 677/1280 [00:50<00:41, 14.55it/s] 53%|█████▎    | 679/1280 [00:50<00:42, 13.98it/s] 53%|█████▎    | 681/1280 [00:51<00:44, 13.57it/s] 53%|█████▎    | 683/1280 [00:51<00:45, 13.16it/s] 54%|█████▎    | 685/1280 [00:51<00:43, 13.80it/s] 54%|█████▎    | 687/1280 [00:51<00:42, 14.09it/s] 54%|█████▍    | 689/1280 [00:51<00:41, 14.30it/s] 54%|█████▍    | 691/1280 [00:51<00:43, 13.66it/s] 54%|█████▍    | 693/1280 [00:51<00:41, 14.12it/s] 54%|█████▍    | 695/1280 [00:52<00:40, 14.39it/s] 54%|█████▍    | 697/1280 [00:52<00:41, 14.00it/s] 55%|█████▍    | 699/1280 [00:52<00:41, 14.01it/s] 55%|█████▍    | 701/1280 [00:52<00:42, 13.76it/s] 55%|█████▍    | 703/1280 [00:52<00:41, 13.84it/s] 55%|█████▌    | 705/1280 [00:52<00:44, 12.89it/s] 55%|█████▌    | 707/1280 [00:53<00:49, 11.66it/s] 55%|█████▌    | 709/1280 [00:53<00:49, 11.53it/s] 56%|█████▌    | 711/1280 [00:53<00:45, 12.41it/s] 56%|█████▌    | 713/1280 [00:53<00:43, 13.18it/s] 56%|█████▌    | 715/1280 [00:53<00:43, 13.07it/s] 56%|█████▌    | 717/1280 [00:53<00:41, 13.52it/s] 56%|█████▌    | 719/1280 [00:53<00:39, 14.09it/s] 56%|█████▋    | 721/1280 [00:54<00:39, 14.08it/s] 56%|█████▋    | 723/1280 [00:54<00:38, 14.30it/s] 57%|█████▋    | 725/1280 [00:54<00:40, 13.81it/s] 57%|█████▋    | 727/1280 [00:54<00:43, 12.79it/s] 57%|█████▋    | 729/1280 [00:54<00:40, 13.52it/s] 57%|█████▋    | 731/1280 [00:54<00:40, 13.48it/s] 57%|█████▋    | 733/1280 [00:54<00:39, 13.75it/s] 57%|█████▋    | 735/1280 [00:55<00:38, 13.98it/s] 58%|█████▊    | 737/1280 [00:55<00:38, 14.14it/s] 58%|█████▊    | 739/1280 [00:55<00:38, 14.14it/s] 58%|█████▊    | 741/1280 [00:55<00:37, 14.21it/s] 58%|█████▊    | 743/1280 [00:55<00:37, 14.26it/s] 58%|█████▊    | 745/1280 [00:55<00:37, 14.41it/s] 58%|█████▊    | 747/1280 [00:55<00:39, 13.50it/s] 59%|█████▊    | 749/1280 [00:56<00:38, 13.89it/s] 59%|█████▊    | 751/1280 [00:56<00:38, 13.58it/s] 59%|█████▉    | 753/1280 [00:56<00:37, 14.00it/s] 59%|█████▉    | 755/1280 [00:56<00:38, 13.68it/s] 59%|█████▉    | 757/1280 [00:56<00:39, 13.40it/s] 59%|█████▉    | 759/1280 [00:56<00:39, 13.28it/s] 59%|█████▉    | 761/1280 [00:56<00:37, 13.68it/s] 60%|█████▉    | 763/1280 [00:57<00:41, 12.40it/s] 60%|█████▉    | 765/1280 [00:57<00:39, 13.07it/s] 60%|█████▉    | 767/1280 [00:57<00:38, 13.46it/s] 60%|██████    | 769/1280 [00:57<00:37, 13.62it/s] 60%|██████    | 771/1280 [00:57<00:38, 13.30it/s] 60%|██████    | 773/1280 [00:57<00:41, 12.28it/s] 61%|██████    | 775/1280 [00:58<00:38, 13.00it/s] 61%|██████    | 777/1280 [00:58<00:41, 12.15it/s] 61%|██████    | 779/1280 [00:58<00:38, 12.88it/s] 61%|██████    | 781/1280 [00:58<00:39, 12.68it/s] 61%|██████    | 783/1280 [00:58<00:38, 12.98it/s] 61%|██████▏   | 785/1280 [00:58<00:36, 13.55it/s] 61%|██████▏   | 787/1280 [00:58<00:35, 13.89it/s] 62%|██████▏   | 789/1280 [00:59<00:37, 13.19it/s] 62%|██████▏   | 791/1280 [00:59<00:38, 12.55it/s] 62%|██████▏   | 793/1280 [00:59<00:37, 12.95it/s] 62%|██████▏   | 795/1280 [00:59<00:36, 13.22it/s] 62%|██████▏   | 797/1280 [00:59<00:35, 13.63it/s] 62%|██████▏   | 799/1280 [00:59<00:35, 13.50it/s] 63%|██████▎   | 801/1280 [01:00<00:34, 13.72it/s] 63%|██████▎   | 803/1280 [01:00<00:35, 13.44it/s] 63%|██████▎   | 805/1280 [01:00<00:34, 13.85it/s] 63%|██████▎   | 807/1280 [01:00<00:35, 13.50it/s] 63%|██████▎   | 809/1280 [01:00<00:34, 13.67it/s] 63%|██████▎   | 811/1280 [01:00<00:33, 14.07it/s] 64%|██████▎   | 813/1280 [01:00<00:33, 13.86it/s] 64%|██████▎   | 815/1280 [01:01<00:33, 13.70it/s] 64%|██████▍   | 817/1280 [01:01<00:32, 14.14it/s] 64%|██████▍   | 819/1280 [01:01<00:33, 13.64it/s] 64%|██████▍   | 821/1280 [01:01<00:32, 13.91it/s] 64%|██████▍   | 823/1280 [01:01<00:32, 14.22it/s] 64%|██████▍   | 825/1280 [01:01<00:31, 14.44it/s] 65%|██████▍   | 827/1280 [01:01<00:31, 14.49it/s] 65%|██████▍   | 829/1280 [01:02<00:31, 14.34it/s] 65%|██████▍   | 831/1280 [01:02<00:32, 13.83it/s] 65%|██████▌   | 833/1280 [01:02<00:32, 13.84it/s] 65%|██████▌   | 835/1280 [01:02<00:38, 11.60it/s] 65%|██████▌   | 837/1280 [01:02<00:40, 10.99it/s] 66%|██████▌   | 839/1280 [01:02<00:40, 10.84it/s] 66%|██████▌   | 841/1280 [01:03<00:39, 11.24it/s] 66%|██████▌   | 843/1280 [01:03<00:37, 11.60it/s] 66%|██████▌   | 845/1280 [01:03<00:36, 11.99it/s] 66%|██████▌   | 847/1280 [01:03<00:34, 12.54it/s] 66%|██████▋   | 849/1280 [01:03<00:33, 13.00it/s] 66%|██████▋   | 851/1280 [01:03<00:32, 13.07it/s] 67%|██████▋   | 853/1280 [01:04<00:34, 12.26it/s] 67%|██████▋   | 855/1280 [01:04<00:32, 13.04it/s] 67%|██████▋   | 857/1280 [01:04<00:31, 13.63it/s] 67%|██████▋   | 859/1280 [01:04<00:31, 13.38it/s] 67%|██████▋   | 861/1280 [01:04<00:30, 13.87it/s] 67%|██████▋   | 863/1280 [01:04<00:32, 12.77it/s] 68%|██████▊   | 865/1280 [01:04<00:30, 13.57it/s] 68%|██████▊   | 867/1280 [01:05<00:30, 13.38it/s] 68%|██████▊   | 869/1280 [01:05<00:30, 13.29it/s] 68%|██████▊   | 871/1280 [01:05<00:30, 13.24it/s] 68%|██████▊   | 873/1280 [01:05<00:31, 13.09it/s] 68%|██████▊   | 875/1280 [01:05<00:30, 13.34it/s] 69%|██████▊   | 877/1280 [01:05<00:29, 13.64it/s] 69%|██████▊   | 879/1280 [01:05<00:28, 13.98it/s] 69%|██████▉   | 881/1280 [01:06<00:28, 13.99it/s] 69%|██████▉   | 883/1280 [01:06<00:29, 13.47it/s] 69%|██████▉   | 885/1280 [01:06<00:28, 13.91it/s] 69%|██████▉   | 887/1280 [01:06<00:28, 13.58it/s] 69%|██████▉   | 889/1280 [01:06<00:29, 13.44it/s] 70%|██████▉   | 891/1280 [01:06<00:30, 12.95it/s] 70%|██████▉   | 893/1280 [01:07<00:31, 12.43it/s] 70%|██████▉   | 895/1280 [01:07<00:31, 12.41it/s] 70%|███████   | 897/1280 [01:07<00:30, 12.37it/s] 70%|███████   | 899/1280 [01:07<00:29, 12.87it/s] 70%|███████   | 901/1280 [01:07<00:29, 12.96it/s] 71%|███████   | 903/1280 [01:07<00:33, 11.24it/s] 71%|███████   | 905/1280 [01:08<00:32, 11.66it/s] 71%|███████   | 907/1280 [01:08<00:30, 12.14it/s] 71%|███████   | 909/1280 [01:08<00:30, 12.32it/s] 71%|███████   | 911/1280 [01:08<00:28, 12.89it/s] 71%|███████▏  | 913/1280 [01:08<00:30, 12.01it/s] 71%|███████▏  | 915/1280 [01:08<00:29, 12.55it/s] 72%|███████▏  | 917/1280 [01:09<00:30, 11.73it/s] 72%|███████▏  | 919/1280 [01:09<00:28, 12.60it/s] 72%|███████▏  | 921/1280 [01:09<00:28, 12.52it/s] 72%|███████▏  | 923/1280 [01:09<00:27, 12.94it/s] 72%|███████▏  | 925/1280 [01:09<00:27, 12.99it/s] 72%|███████▏  | 927/1280 [01:09<00:26, 13.19it/s] 73%|███████▎  | 929/1280 [01:09<00:26, 13.39it/s] 73%|███████▎  | 931/1280 [01:10<00:26, 13.14it/s] 73%|███████▎  | 933/1280 [01:10<00:25, 13.45it/s] 73%|███████▎  | 935/1280 [01:10<00:24, 13.97it/s] 73%|███████▎  | 937/1280 [01:10<00:24, 13.80it/s] 73%|███████▎  | 939/1280 [01:10<00:25, 13.60it/s] 74%|███████▎  | 941/1280 [01:10<00:25, 13.11it/s] 74%|███████▎  | 943/1280 [01:10<00:25, 13.29it/s] 74%|███████▍  | 945/1280 [01:11<00:24, 13.57it/s] 74%|███████▍  | 947/1280 [01:11<00:23, 14.10it/s] 74%|███████▍  | 949/1280 [01:11<00:23, 13.90it/s] 74%|███████▍  | 951/1280 [01:11<00:22, 14.36it/s] 74%|███████▍  | 953/1280 [01:11<00:23, 14.07it/s] 75%|███████▍  | 955/1280 [01:11<00:23, 14.08it/s] 75%|███████▍  | 957/1280 [01:11<00:22, 14.17it/s] 75%|███████▍  | 959/1280 [01:12<00:23, 13.57it/s] 75%|███████▌  | 961/1280 [01:12<00:23, 13.72it/s] 75%|███████▌  | 963/1280 [01:12<00:22, 13.81it/s] 75%|███████▌  | 965/1280 [01:12<00:22, 14.17it/s] 76%|███████▌  | 967/1280 [01:12<00:22, 13.93it/s] 76%|███████▌  | 969/1280 [01:12<00:23, 13.51it/s] 76%|███████▌  | 971/1280 [01:12<00:22, 13.93it/s] 76%|███████▌  | 973/1280 [01:13<00:24, 12.42it/s] 76%|███████▌  | 975/1280 [01:13<00:23, 13.06it/s] 76%|███████▋  | 977/1280 [01:13<00:23, 13.06it/s] 76%|███████▋  | 979/1280 [01:13<00:23, 12.62it/s] 77%|███████▋  | 981/1280 [01:13<00:22, 13.14it/s] 77%|███████▋  | 983/1280 [01:13<00:22, 13.48it/s] 77%|███████▋  | 985/1280 [01:14<00:21, 13.93it/s] 77%|███████▋  | 987/1280 [01:14<00:21, 13.69it/s] 77%|███████▋  | 989/1280 [01:14<00:20, 14.13it/s] 77%|███████▋  | 991/1280 [01:14<00:20, 13.91it/s] 78%|███████▊  | 993/1280 [01:14<00:20, 13.70it/s] 78%|███████▊  | 995/1280 [01:14<00:20, 13.99it/s] 78%|███████▊  | 997/1280 [01:14<00:20, 14.07it/s] 78%|███████▊  | 999/1280 [01:15<00:19, 14.42it/s]                                                  {'loss': 0.4383, 'learning_rate': 1.1015625e-05, 'epoch': 0.78}
 78%|███████▊  | 1000/1280 [01:15<00:19, 14.42it/s] 78%|███████▊  | 1001/1280 [01:15<00:19, 14.55it/s] 78%|███████▊  | 1003/1280 [01:15<00:19, 14.46it/s] 79%|███████▊  | 1005/1280 [01:15<00:18, 14.60it/s] 79%|███████▊  | 1007/1280 [01:15<00:19, 14.23it/s] 79%|███████▉  | 1009/1280 [01:15<00:19, 13.56it/s] 79%|███████▉  | 1011/1280 [01:15<00:22, 11.91it/s] 79%|███████▉  | 1013/1280 [01:16<00:21, 12.57it/s] 79%|███████▉  | 1015/1280 [01:16<00:19, 13.30it/s] 79%|███████▉  | 1017/1280 [01:16<00:20, 13.14it/s] 80%|███████▉  | 1019/1280 [01:16<00:20, 13.03it/s] 80%|███████▉  | 1021/1280 [01:16<00:21, 11.95it/s] 80%|███████▉  | 1023/1280 [01:16<00:20, 12.74it/s] 80%|████████  | 1025/1280 [01:16<00:18, 13.53it/s] 80%|████████  | 1027/1280 [01:17<00:17, 14.14it/s] 80%|████████  | 1029/1280 [01:17<00:19, 12.59it/s] 81%|████████  | 1031/1280 [01:17<00:18, 13.11it/s] 81%|████████  | 1033/1280 [01:17<00:18, 13.03it/s] 81%|████████  | 1035/1280 [01:17<00:18, 13.24it/s] 81%|████████  | 1037/1280 [01:17<00:17, 13.70it/s] 81%|████████  | 1039/1280 [01:18<00:17, 14.14it/s] 81%|████████▏ | 1041/1280 [01:18<00:17, 13.52it/s] 81%|████████▏ | 1043/1280 [01:18<00:17, 13.48it/s] 82%|████████▏ | 1045/1280 [01:18<00:17, 13.28it/s] 82%|████████▏ | 1047/1280 [01:18<00:18, 12.72it/s] 82%|████████▏ | 1049/1280 [01:18<00:19, 12.13it/s] 82%|████████▏ | 1051/1280 [01:18<00:18, 12.49it/s] 82%|████████▏ | 1053/1280 [01:19<00:17, 13.07it/s] 82%|████████▏ | 1055/1280 [01:19<00:17, 12.86it/s] 83%|████████▎ | 1057/1280 [01:19<00:16, 13.29it/s] 83%|████████▎ | 1059/1280 [01:19<00:16, 13.08it/s] 83%|████████▎ | 1061/1280 [01:19<00:17, 12.77it/s] 83%|████████▎ | 1063/1280 [01:19<00:16, 13.14it/s] 83%|████████▎ | 1065/1280 [01:20<00:15, 13.57it/s] 83%|████████▎ | 1067/1280 [01:20<00:16, 13.20it/s] 84%|████████▎ | 1069/1280 [01:20<00:15, 13.51it/s] 84%|████████▎ | 1071/1280 [01:20<00:15, 13.27it/s] 84%|████████▍ | 1073/1280 [01:20<00:15, 13.63it/s] 84%|████████▍ | 1075/1280 [01:20<00:15, 13.33it/s] 84%|████████▍ | 1077/1280 [01:20<00:14, 13.82it/s] 84%|████████▍ | 1079/1280 [01:21<00:14, 13.95it/s] 84%|████████▍ | 1081/1280 [01:21<00:14, 13.72it/s] 85%|████████▍ | 1083/1280 [01:21<00:14, 13.93it/s] 85%|████████▍ | 1085/1280 [01:21<00:14, 13.34it/s] 85%|████████▍ | 1087/1280 [01:21<00:13, 13.83it/s] 85%|████████▌ | 1089/1280 [01:21<00:13, 14.19it/s] 85%|████████▌ | 1091/1280 [01:21<00:13, 14.26it/s] 85%|████████▌ | 1093/1280 [01:22<00:13, 13.90it/s] 86%|████████▌ | 1095/1280 [01:22<00:13, 14.11it/s] 86%|████████▌ | 1097/1280 [01:22<00:14, 12.87it/s] 86%|████████▌ | 1099/1280 [01:22<00:13, 12.96it/s] 86%|████████▌ | 1101/1280 [01:22<00:13, 12.95it/s] 86%|████████▌ | 1103/1280 [01:22<00:13, 13.17it/s] 86%|████████▋ | 1105/1280 [01:22<00:13, 13.15it/s] 86%|████████▋ | 1107/1280 [01:23<00:12, 13.42it/s] 87%|████████▋ | 1109/1280 [01:23<00:13, 12.25it/s] 87%|████████▋ | 1111/1280 [01:23<00:13, 12.68it/s] 87%|████████▋ | 1113/1280 [01:23<00:12, 12.98it/s] 87%|████████▋ | 1115/1280 [01:23<00:12, 13.27it/s] 87%|████████▋ | 1117/1280 [01:23<00:13, 12.31it/s] 87%|████████▋ | 1119/1280 [01:24<00:12, 13.07it/s] 88%|████████▊ | 1121/1280 [01:24<00:12, 13.13it/s] 88%|████████▊ | 1123/1280 [01:24<00:11, 13.31it/s] 88%|████████▊ | 1125/1280 [01:24<00:11, 13.65it/s] 88%|████████▊ | 1127/1280 [01:24<00:11, 13.75it/s] 88%|████████▊ | 1129/1280 [01:24<00:10, 14.13it/s] 88%|████████▊ | 1131/1280 [01:24<00:10, 14.35it/s] 89%|████████▊ | 1133/1280 [01:25<00:11, 13.11it/s] 89%|████████▊ | 1135/1280 [01:25<00:10, 13.60it/s] 89%|████████▉ | 1137/1280 [01:25<00:10, 13.88it/s] 89%|████████▉ | 1139/1280 [01:25<00:10, 13.98it/s] 89%|████████▉ | 1141/1280 [01:25<00:10, 13.36it/s] 89%|████████▉ | 1143/1280 [01:25<00:09, 13.85it/s] 89%|████████▉ | 1145/1280 [01:25<00:09, 14.33it/s] 90%|████████▉ | 1147/1280 [01:26<00:10, 13.08it/s] 90%|████████▉ | 1149/1280 [01:26<00:09, 13.74it/s] 90%|████████▉ | 1151/1280 [01:26<00:09, 13.89it/s] 90%|█████████ | 1153/1280 [01:26<00:10, 12.70it/s] 90%|█████████ | 1155/1280 [01:26<00:09, 13.40it/s] 90%|█████████ | 1157/1280 [01:26<00:09, 13.65it/s] 91%|█████████ | 1159/1280 [01:26<00:08, 14.00it/s] 91%|█████████ | 1161/1280 [01:27<00:08, 14.15it/s] 91%|█████████ | 1163/1280 [01:27<00:08, 13.21it/s] 91%|█████████ | 1165/1280 [01:27<00:09, 12.44it/s] 91%|█████████ | 1167/1280 [01:27<00:08, 13.09it/s] 91%|█████████▏| 1169/1280 [01:27<00:08, 12.61it/s] 91%|█████████▏| 1171/1280 [01:27<00:09, 11.76it/s] 92%|█████████▏| 1173/1280 [01:28<00:09, 11.13it/s] 92%|█████████▏| 1175/1280 [01:28<00:08, 11.70it/s] 92%|█████████▏| 1177/1280 [01:28<00:08, 12.50it/s] 92%|█████████▏| 1179/1280 [01:28<00:08, 12.46it/s] 92%|█████████▏| 1181/1280 [01:28<00:07, 13.15it/s] 92%|█████████▏| 1183/1280 [01:28<00:07, 13.68it/s] 93%|█████████▎| 1185/1280 [01:29<00:06, 13.86it/s] 93%|█████████▎| 1187/1280 [01:29<00:07, 12.31it/s] 93%|█████████▎| 1189/1280 [01:29<00:07, 12.85it/s] 93%|█████████▎| 1191/1280 [01:29<00:06, 13.57it/s] 93%|█████████▎| 1193/1280 [01:29<00:06, 13.60it/s] 93%|█████████▎| 1195/1280 [01:29<00:06, 13.90it/s] 94%|█████████▎| 1197/1280 [01:29<00:05, 14.01it/s] 94%|█████████▎| 1199/1280 [01:30<00:05, 14.44it/s] 94%|█████████▍| 1201/1280 [01:30<00:05, 14.41it/s] 94%|█████████▍| 1203/1280 [01:30<00:05, 14.31it/s] 94%|█████████▍| 1205/1280 [01:30<00:05, 14.43it/s] 94%|█████████▍| 1207/1280 [01:30<00:05, 14.51it/s] 94%|█████████▍| 1209/1280 [01:30<00:05, 13.64it/s] 95%|█████████▍| 1211/1280 [01:30<00:04, 13.84it/s] 95%|█████████▍| 1213/1280 [01:31<00:04, 13.71it/s] 95%|█████████▍| 1215/1280 [01:31<00:04, 13.37it/s] 95%|█████████▌| 1217/1280 [01:31<00:04, 13.88it/s] 95%|█████████▌| 1219/1280 [01:31<00:04, 14.40it/s] 95%|█████████▌| 1221/1280 [01:31<00:04, 13.72it/s] 96%|█████████▌| 1223/1280 [01:31<00:04, 13.38it/s] 96%|█████████▌| 1225/1280 [01:31<00:03, 13.90it/s] 96%|█████████▌| 1227/1280 [01:32<00:03, 13.56it/s] 96%|█████████▌| 1229/1280 [01:32<00:03, 13.89it/s] 96%|█████████▌| 1231/1280 [01:32<00:03, 13.30it/s] 96%|█████████▋| 1233/1280 [01:32<00:03, 13.89it/s] 96%|█████████▋| 1235/1280 [01:32<00:03, 14.07it/s] 97%|█████████▋| 1237/1280 [01:32<00:03, 13.96it/s] 97%|█████████▋| 1239/1280 [01:32<00:02, 13.89it/s] 97%|█████████▋| 1241/1280 [01:33<00:02, 13.93it/s] 97%|█████████▋| 1243/1280 [01:33<00:02, 14.01it/s] 97%|█████████▋| 1245/1280 [01:33<00:02, 14.14it/s] 97%|█████████▋| 1247/1280 [01:33<00:02, 13.94it/s] 98%|█████████▊| 1249/1280 [01:33<00:02, 14.24it/s] 98%|█████████▊| 1251/1280 [01:33<00:02, 14.24it/s] 98%|█████████▊| 1253/1280 [01:33<00:01, 14.54it/s] 98%|█████████▊| 1255/1280 [01:34<00:01, 14.18it/s] 98%|█████████▊| 1257/1280 [01:34<00:01, 13.97it/s] 98%|█████████▊| 1259/1280 [01:34<00:01, 13.75it/s] 99%|█████████▊| 1261/1280 [01:34<00:01, 13.85it/s] 99%|█████████▊| 1263/1280 [01:34<00:01, 14.29it/s] 99%|█████████▉| 1265/1280 [01:34<00:01, 12.84it/s] 99%|█████████▉| 1267/1280 [01:35<00:01, 12.50it/s] 99%|█████████▉| 1269/1280 [01:35<00:00, 13.25it/s] 99%|█████████▉| 1271/1280 [01:35<00:00, 13.50it/s] 99%|█████████▉| 1273/1280 [01:35<00:00, 13.58it/s]100%|█████████▉| 1275/1280 [01:35<00:00, 13.42it/s]100%|█████████▉| 1277/1280 [01:35<00:00, 13.66it/s]100%|█████████▉| 1279/1280 [01:35<00:00, 13.79it/s][INFO|trainer.py:1761] 2022-07-22 12:33:45,822 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


                                                   {'train_runtime': 95.9834, 'train_samples_per_second': 426.48, 'train_steps_per_second': 13.336, 'train_loss': 0.4583479344844818, 'epoch': 1.0}
100%|██████████| 1280/1280 [01:35<00:00, 13.79it/s]100%|██████████| 1280/1280 [01:35<00:00, 13.34it/s]
[INFO|trainer.py:2503] 2022-07-22 12:33:45,829 >> Saving model checkpoint to runs/ebmnlp_hf/BioLinkBERT-base
[INFO|configuration_utils.py:446] 2022-07-22 12:33:45,831 >> Configuration saved in runs/ebmnlp_hf/BioLinkBERT-base/config.json
[INFO|modeling_utils.py:1660] 2022-07-22 12:33:46,435 >> Model weights saved in runs/ebmnlp_hf/BioLinkBERT-base/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2022-07-22 12:33:46,436 >> tokenizer config file saved in runs/ebmnlp_hf/BioLinkBERT-base/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2022-07-22 12:33:46,436 >> Special tokens file saved in runs/ebmnlp_hf/BioLinkBERT-base/special_tokens_map.json
***** train metrics *****
  epoch                    =        1.0
  train_loss               =     0.4583
  train_runtime            = 0:01:35.98
  train_samples            =      40935
  train_samples_per_second =     426.48
  train_steps_per_second   =     13.336
07/22/2022 12:33:46 - INFO - __main__ - *** Evaluate ***
[INFO|trainer.py:661] 2022-07-22 12:33:46,477 >> The following columns in the evaluation set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: id, ner_tags, word_ids, tokens. If id, ner_tags, word_ids, tokens are not expected by `BertForTokenClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2753] 2022-07-22 12:33:46,479 >> ***** Running Evaluation *****
[INFO|trainer.py:2755] 2022-07-22 12:33:46,479 >>   Num examples = 10386
[INFO|trainer.py:2758] 2022-07-22 12:33:46,479 >>   Batch size = 8
  0%|          | 0/1299 [00:00<?, ?it/s]  1%|          | 8/1299 [00:00<00:17, 72.89it/s]  1%|          | 16/1299 [00:00<00:19, 67.39it/s]  2%|▏         | 23/1299 [00:00<00:19, 65.81it/s]  2%|▏         | 30/1299 [00:00<00:19, 64.92it/s]  3%|▎         | 37/1299 [00:00<00:19, 64.65it/s]  3%|▎         | 44/1299 [00:00<00:19, 64.42it/s]  4%|▍         | 51/1299 [00:00<00:19, 64.17it/s]  4%|▍         | 58/1299 [00:00<00:19, 64.09it/s]  5%|▌         | 65/1299 [00:01<00:19, 64.00it/s]  6%|▌         | 72/1299 [00:01<00:19, 63.66it/s]  6%|▌         | 79/1299 [00:01<00:19, 63.73it/s]  7%|▋         | 86/1299 [00:01<00:18, 63.85it/s]  7%|▋         | 93/1299 [00:01<00:19, 63.44it/s]  8%|▊         | 100/1299 [00:01<00:18, 63.49it/s]  8%|▊         | 107/1299 [00:01<00:18, 63.51it/s]  9%|▉         | 114/1299 [00:01<00:18, 63.49it/s]  9%|▉         | 121/1299 [00:01<00:18, 63.40it/s] 10%|▉         | 128/1299 [00:01<00:18, 63.44it/s] 10%|█         | 135/1299 [00:02<00:18, 63.42it/s] 11%|█         | 142/1299 [00:02<00:18, 63.51it/s] 11%|█▏        | 149/1299 [00:02<00:18, 63.64it/s] 12%|█▏        | 156/1299 [00:02<00:17, 63.63it/s] 13%|█▎        | 163/1299 [00:02<00:17, 63.62it/s] 13%|█▎        | 170/1299 [00:02<00:17, 63.74it/s] 14%|█▎        | 177/1299 [00:02<00:17, 63.73it/s] 14%|█▍        | 184/1299 [00:02<00:17, 63.49it/s] 15%|█▍        | 191/1299 [00:02<00:17, 63.53it/s] 15%|█▌        | 198/1299 [00:03<00:17, 63.77it/s] 16%|█▌        | 205/1299 [00:03<00:17, 63.78it/s] 16%|█▋        | 212/1299 [00:03<00:17, 63.85it/s] 17%|█▋        | 219/1299 [00:03<00:16, 63.83it/s] 17%|█▋        | 226/1299 [00:03<00:16, 63.83it/s] 18%|█▊        | 233/1299 [00:03<00:16, 63.89it/s] 18%|█▊        | 240/1299 [00:03<00:16, 63.93it/s] 19%|█▉        | 247/1299 [00:03<00:16, 63.88it/s] 20%|█▉        | 254/1299 [00:03<00:16, 63.93it/s] 20%|██        | 261/1299 [00:04<00:16, 64.00it/s] 21%|██        | 268/1299 [00:04<00:16, 63.84it/s] 21%|██        | 275/1299 [00:04<00:16, 63.91it/s] 22%|██▏       | 282/1299 [00:04<00:15, 63.85it/s] 22%|██▏       | 289/1299 [00:04<00:15, 63.79it/s] 23%|██▎       | 296/1299 [00:04<00:15, 63.79it/s] 23%|██▎       | 303/1299 [00:04<00:15, 63.75it/s] 24%|██▍       | 310/1299 [00:04<00:15, 63.86it/s] 24%|██▍       | 317/1299 [00:04<00:15, 63.86it/s] 25%|██▍       | 324/1299 [00:05<00:15, 63.81it/s] 25%|██▌       | 331/1299 [00:05<00:15, 63.97it/s] 26%|██▌       | 338/1299 [00:05<00:15, 63.88it/s] 27%|██▋       | 345/1299 [00:05<00:14, 63.83it/s] 27%|██▋       | 352/1299 [00:05<00:14, 63.92it/s] 28%|██▊       | 359/1299 [00:05<00:14, 63.93it/s] 28%|██▊       | 366/1299 [00:05<00:14, 63.96it/s] 29%|██▊       | 373/1299 [00:05<00:14, 63.97it/s] 29%|██▉       | 380/1299 [00:05<00:14, 63.79it/s] 30%|██▉       | 387/1299 [00:06<00:14, 63.70it/s] 30%|███       | 394/1299 [00:06<00:14, 63.63it/s] 31%|███       | 401/1299 [00:06<00:14, 63.61it/s] 31%|███▏      | 408/1299 [00:06<00:13, 63.87it/s] 32%|███▏      | 415/1299 [00:06<00:13, 63.91it/s] 32%|███▏      | 422/1299 [00:06<00:13, 63.84it/s] 33%|███▎      | 429/1299 [00:06<00:13, 63.82it/s] 34%|███▎      | 436/1299 [00:06<00:13, 63.95it/s] 34%|███▍      | 443/1299 [00:06<00:13, 64.01it/s] 35%|███▍      | 450/1299 [00:07<00:13, 64.08it/s] 35%|███▌      | 457/1299 [00:07<00:13, 64.04it/s] 36%|███▌      | 464/1299 [00:07<00:13, 64.04it/s] 36%|███▋      | 471/1299 [00:07<00:12, 63.94it/s] 37%|███▋      | 478/1299 [00:07<00:12, 63.82it/s] 37%|███▋      | 485/1299 [00:07<00:12, 63.94it/s] 38%|███▊      | 492/1299 [00:07<00:12, 63.98it/s] 38%|███▊      | 499/1299 [00:07<00:12, 63.71it/s] 39%|███▉      | 506/1299 [00:07<00:12, 63.79it/s] 39%|███▉      | 513/1299 [00:08<00:12, 63.83it/s] 40%|████      | 520/1299 [00:08<00:12, 63.85it/s] 41%|████      | 527/1299 [00:08<00:12, 63.80it/s] 41%|████      | 534/1299 [00:08<00:11, 63.76it/s] 42%|████▏     | 541/1299 [00:08<00:11, 63.91it/s] 42%|████▏     | 548/1299 [00:08<00:11, 64.07it/s] 43%|████▎     | 555/1299 [00:08<00:11, 64.08it/s] 43%|████▎     | 562/1299 [00:08<00:11, 64.13it/s] 44%|████▍     | 569/1299 [00:08<00:11, 64.11it/s] 44%|████▍     | 576/1299 [00:09<00:11, 63.91it/s] 45%|████▍     | 583/1299 [00:09<00:11, 63.98it/s] 45%|████▌     | 590/1299 [00:09<00:11, 63.98it/s] 46%|████▌     | 597/1299 [00:09<00:10, 63.98it/s] 46%|████▋     | 604/1299 [00:09<00:10, 63.92it/s] 47%|████▋     | 611/1299 [00:09<00:10, 63.88it/s] 48%|████▊     | 618/1299 [00:09<00:10, 63.96it/s] 48%|████▊     | 625/1299 [00:09<00:10, 63.78it/s] 49%|████▊     | 632/1299 [00:09<00:10, 63.83it/s] 49%|████▉     | 639/1299 [00:10<00:10, 63.16it/s] 50%|████▉     | 646/1299 [00:10<00:10, 63.26it/s] 50%|█████     | 653/1299 [00:10<00:10, 63.41it/s] 51%|█████     | 660/1299 [00:10<00:10, 63.57it/s] 51%|█████▏    | 667/1299 [00:10<00:09, 63.45it/s] 52%|█████▏    | 674/1299 [00:10<00:09, 63.64it/s] 52%|█████▏    | 681/1299 [00:10<00:09, 63.78it/s] 53%|█████▎    | 688/1299 [00:10<00:09, 63.51it/s] 54%|█████▎    | 695/1299 [00:10<00:09, 63.66it/s] 54%|█████▍    | 702/1299 [00:10<00:09, 63.93it/s] 55%|█████▍    | 709/1299 [00:11<00:09, 63.86it/s] 55%|█████▌    | 716/1299 [00:11<00:09, 63.83it/s] 56%|█████▌    | 723/1299 [00:11<00:09, 63.78it/s] 56%|█████▌    | 730/1299 [00:11<00:08, 63.93it/s] 57%|█████▋    | 737/1299 [00:11<00:08, 63.81it/s] 57%|█████▋    | 744/1299 [00:11<00:08, 63.86it/s] 58%|█████▊    | 751/1299 [00:11<00:08, 64.02it/s] 58%|█████▊    | 758/1299 [00:11<00:08, 63.98it/s] 59%|█████▉    | 765/1299 [00:11<00:08, 63.95it/s] 59%|█████▉    | 772/1299 [00:12<00:08, 63.90it/s] 60%|█████▉    | 779/1299 [00:12<00:08, 64.10it/s] 61%|██████    | 786/1299 [00:12<00:08, 64.00it/s] 61%|██████    | 793/1299 [00:12<00:07, 64.00it/s] 62%|██████▏   | 800/1299 [00:12<00:07, 64.06it/s] 62%|██████▏   | 807/1299 [00:12<00:07, 64.12it/s] 63%|██████▎   | 814/1299 [00:12<00:07, 64.01it/s] 63%|██████▎   | 821/1299 [00:12<00:07, 63.89it/s] 64%|██████▎   | 828/1299 [00:12<00:07, 63.90it/s] 64%|██████▍   | 835/1299 [00:13<00:07, 63.95it/s] 65%|██████▍   | 842/1299 [00:13<00:07, 63.95it/s] 65%|██████▌   | 849/1299 [00:13<00:07, 64.02it/s] 66%|██████▌   | 856/1299 [00:13<00:06, 63.86it/s] 66%|██████▋   | 863/1299 [00:13<00:06, 63.76it/s] 67%|██████▋   | 870/1299 [00:13<00:06, 63.72it/s] 68%|██████▊   | 877/1299 [00:13<00:06, 63.71it/s] 68%|██████▊   | 884/1299 [00:13<00:06, 63.77it/s] 69%|██████▊   | 891/1299 [00:13<00:06, 63.70it/s] 69%|██████▉   | 898/1299 [00:14<00:06, 63.74it/s] 70%|██████▉   | 905/1299 [00:14<00:06, 63.77it/s] 70%|███████   | 912/1299 [00:14<00:06, 63.83it/s] 71%|███████   | 919/1299 [00:14<00:05, 63.81it/s] 71%|███████▏  | 926/1299 [00:14<00:05, 63.89it/s] 72%|███████▏  | 933/1299 [00:14<00:05, 63.91it/s] 72%|███████▏  | 940/1299 [00:14<00:05, 63.97it/s] 73%|███████▎  | 947/1299 [00:14<00:05, 63.93it/s] 73%|███████▎  | 954/1299 [00:14<00:05, 63.87it/s] 74%|███████▍  | 961/1299 [00:15<00:05, 63.71it/s] 75%|███████▍  | 968/1299 [00:15<00:05, 63.72it/s] 75%|███████▌  | 975/1299 [00:15<00:05, 63.85it/s] 76%|███████▌  | 982/1299 [00:15<00:04, 63.93it/s] 76%|███████▌  | 989/1299 [00:15<00:04, 64.09it/s] 77%|███████▋  | 996/1299 [00:15<00:04, 63.93it/s] 77%|███████▋  | 1003/1299 [00:15<00:04, 63.89it/s] 78%|███████▊  | 1010/1299 [00:15<00:04, 63.70it/s] 78%|███████▊  | 1017/1299 [00:15<00:04, 63.75it/s] 79%|███████▉  | 1024/1299 [00:16<00:04, 63.72it/s] 79%|███████▉  | 1031/1299 [00:16<00:04, 63.68it/s] 80%|███████▉  | 1038/1299 [00:16<00:04, 63.76it/s] 80%|████████  | 1045/1299 [00:16<00:03, 63.74it/s] 81%|████████  | 1052/1299 [00:16<00:03, 63.94it/s] 82%|████████▏ | 1059/1299 [00:16<00:03, 63.78it/s] 82%|████████▏ | 1066/1299 [00:16<00:03, 63.74it/s] 83%|████████▎ | 1073/1299 [00:16<00:03, 63.66it/s] 83%|████████▎ | 1080/1299 [00:16<00:03, 63.61it/s] 84%|████████▎ | 1087/1299 [00:17<00:03, 63.64it/s] 84%|████████▍ | 1094/1299 [00:17<00:03, 63.80it/s] 85%|████████▍ | 1101/1299 [00:17<00:03, 63.68it/s] 85%|████████▌ | 1108/1299 [00:17<00:02, 63.77it/s] 86%|████████▌ | 1115/1299 [00:17<00:02, 63.84it/s] 86%|████████▋ | 1122/1299 [00:17<00:02, 63.74it/s] 87%|████████▋ | 1129/1299 [00:17<00:02, 63.80it/s] 87%|████████▋ | 1136/1299 [00:17<00:02, 63.93it/s] 88%|████████▊ | 1143/1299 [00:17<00:02, 63.94it/s] 89%|████████▊ | 1150/1299 [00:18<00:02, 63.91it/s] 89%|████████▉ | 1157/1299 [00:18<00:02, 63.89it/s] 90%|████████▉ | 1164/1299 [00:18<00:02, 63.84it/s] 90%|█████████ | 1171/1299 [00:18<00:02, 63.95it/s] 91%|█████████ | 1178/1299 [00:18<00:01, 63.95it/s] 91%|█████████ | 1185/1299 [00:18<00:01, 64.14it/s] 92%|█████████▏| 1192/1299 [00:18<00:01, 63.65it/s] 92%|█████████▏| 1199/1299 [00:18<00:01, 63.71it/s] 93%|█████████▎| 1206/1299 [00:18<00:01, 63.56it/s] 93%|█████████▎| 1213/1299 [00:18<00:01, 63.57it/s] 94%|█████████▍| 1220/1299 [00:19<00:01, 63.33it/s] 94%|█████████▍| 1227/1299 [00:19<00:01, 63.43it/s] 95%|█████████▍| 1234/1299 [00:19<00:01, 63.49it/s] 96%|█████████▌| 1241/1299 [00:19<00:00, 63.46it/s] 96%|█████████▌| 1248/1299 [00:19<00:00, 63.52it/s] 97%|█████████▋| 1255/1299 [00:19<00:00, 63.43it/s] 97%|█████████▋| 1262/1299 [00:19<00:00, 63.53it/s] 98%|█████████▊| 1269/1299 [00:19<00:00, 63.51it/s] 98%|█████████▊| 1276/1299 [00:19<00:00, 63.48it/s] 99%|█████████▉| 1283/1299 [00:20<00:00, 63.54it/s] 99%|█████████▉| 1290/1299 [00:20<00:00, 63.50it/s]100%|█████████▉| 1297/1299 [00:20<00:00, 63.56it/s]07/22/2022 12:34:13 - INFO - datasets.metric - Removing /home/hungthinht/.cache/huggingface/metrics/seqeval/default/default_experiment-1-0.arrow
type_name INT
type_name OUT
type_name PAR
100%|██████████| 1299/1299 [00:26<00:00, 48.70it/s]
***** eval metrics *****
  epoch                   =        1.0
  eval_loss               =     0.4264
  eval_macro_f1           =     0.7191
  eval_macro_precision    =     0.7641
  eval_macro_recall       =     0.6793
  eval_runtime            = 0:00:26.69
  eval_samples            =      10386
  eval_samples_per_second =    389.133
  eval_steps_per_second   =      48.67
07/22/2022 12:34:13 - INFO - __main__ - *** Predict ***
[INFO|trainer.py:661] 2022-07-22 12:34:13,176 >> The following columns in the test set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: id, ner_tags, word_ids, tokens. If id, ner_tags, word_ids, tokens are not expected by `BertForTokenClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2753] 2022-07-22 12:34:13,177 >> ***** Running Prediction *****
[INFO|trainer.py:2755] 2022-07-22 12:34:13,178 >>   Num examples = 2076
[INFO|trainer.py:2758] 2022-07-22 12:34:13,178 >>   Batch size = 8
  0%|          | 0/260 [00:00<?, ?it/s]  3%|▎         | 8/260 [00:00<00:03, 72.67it/s]  6%|▌         | 16/260 [00:00<00:03, 67.39it/s]  9%|▉         | 23/260 [00:00<00:03, 66.06it/s] 12%|█▏        | 30/260 [00:00<00:03, 65.17it/s] 14%|█▍        | 37/260 [00:00<00:03, 64.44it/s] 17%|█▋        | 44/260 [00:00<00:03, 64.25it/s] 20%|█▉        | 51/260 [00:00<00:03, 64.15it/s] 22%|██▏       | 58/260 [00:00<00:03, 64.02it/s] 25%|██▌       | 65/260 [00:01<00:03, 63.86it/s] 28%|██▊       | 72/260 [00:01<00:02, 63.76it/s] 30%|███       | 79/260 [00:01<00:02, 63.78it/s] 33%|███▎      | 86/260 [00:01<00:02, 63.62it/s] 36%|███▌      | 93/260 [00:01<00:02, 63.55it/s] 38%|███▊      | 100/260 [00:01<00:02, 63.62it/s] 41%|████      | 107/260 [00:01<00:02, 63.78it/s] 44%|████▍     | 114/260 [00:01<00:02, 63.79it/s] 47%|████▋     | 121/260 [00:01<00:02, 63.79it/s] 49%|████▉     | 128/260 [00:01<00:02, 63.72it/s] 52%|█████▏    | 135/260 [00:02<00:01, 63.79it/s] 55%|█████▍    | 142/260 [00:02<00:01, 63.71it/s] 57%|█████▋    | 149/260 [00:02<00:01, 63.77it/s] 60%|██████    | 156/260 [00:02<00:01, 63.98it/s] 63%|██████▎   | 163/260 [00:02<00:01, 64.06it/s] 65%|██████▌   | 170/260 [00:02<00:01, 64.05it/s] 68%|██████▊   | 177/260 [00:02<00:01, 64.01it/s] 71%|███████   | 184/260 [00:02<00:01, 64.00it/s] 73%|███████▎  | 191/260 [00:02<00:01, 63.86it/s] 76%|███████▌  | 198/260 [00:03<00:00, 63.89it/s] 79%|███████▉  | 205/260 [00:03<00:00, 63.87it/s] 82%|████████▏ | 212/260 [00:03<00:00, 63.81it/s] 84%|████████▍ | 219/260 [00:03<00:00, 63.78it/s] 87%|████████▋ | 226/260 [00:03<00:00, 63.68it/s] 90%|████████▉ | 233/260 [00:03<00:00, 63.76it/s] 92%|█████████▏| 240/260 [00:03<00:00, 63.85it/s] 95%|█████████▌| 247/260 [00:03<00:00, 63.75it/s] 98%|█████████▊| 254/260 [00:03<00:00, 63.90it/s]07/22/2022 12:34:18 - INFO - datasets.metric - Removing /home/hungthinht/.cache/huggingface/metrics/seqeval/default/default_experiment-1-0.arrow
type_name INT
type_name OUT
type_name PAR
***** test metrics *****
  test_loss               =     0.4043
  test_macro_f1           =     0.7406
  test_macro_precision    =     0.7578
  test_macro_recall       =     0.7467
  test_runtime            = 0:00:05.17
  test_samples            =       2076
  test_samples_per_second =    401.101
  test_steps_per_second   =     50.234
100%|██████████| 260/260 [00:07<00:00, 36.10it/s]