yhyu13
Upload
edc5215
Using RTX 3090 or 4000 series which doesn't support faster communication speedups. Ensuring P2P and IB communications are disabled.
01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - We recommend enable `upcast_layernorm` in quantized training.
01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - We recommend enable mixed precision training.
01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
[INFO|training_args.py:1838] 2024-01-04 09:53:50,866 >> PyTorch: setting up devices
/home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of πŸ€— Transformers. Use `--hub_token` instead.
warnings.warn(
01/04/2024 09:53:50 - INFO - llmtuner.model.parser - Process rank: 0, device: cuda:0, n_gpu: 1
distributed training: True, compute dtype: None
01/04/2024 09:53:50 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=False,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/runs/Jan04_09-53-50_yhyu13fuwuqi,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_kwargs={},
lr_scheduler_type=SchedulerType.COSINE,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=1.0,
optim=OptimizerNames.ADAMW_TORCH,
optim_args=None,
output_dir=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=1,
predict_with_generate=False,
prediction_loss_only=True,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=1000,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
sortish_sampler=False,
split_batches=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
01/04/2024 09:53:50 - INFO - llmtuner.data.loader - Loading dataset ./glaive-function-calling-v2/simple-function-calling-v2_converted.json...
01/04/2024 09:53:50 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.
01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - We recommend enable `upcast_layernorm` in quantized training.
01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - We recommend enable mixed precision training.
01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
/home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of πŸ€— Transformers. Use `--hub_token` instead.
warnings.warn(
01/04/2024 09:53:50 - INFO - llmtuner.model.parser - Process rank: 1, device: cuda:1, n_gpu: 1
distributed training: True, compute dtype: None
01/04/2024 09:53:50 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=False,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=1,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/runs/Jan04_09-53-50_yhyu13fuwuqi,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_kwargs={},
lr_scheduler_type=SchedulerType.COSINE,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=1.0,
optim=OptimizerNames.ADAMW_TORCH,
optim_args=None,
output_dir=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=1,
predict_with_generate=False,
prediction_loss_only=True,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=1000,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
sortish_sampler=False,
split_batches=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
01/04/2024 09:53:50 - INFO - llmtuner.data.loader - Loading dataset ./glaive-function-calling-v2/simple-function-calling-v2_converted.json...
01/04/2024 09:53:50 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.
Using custom data configuration default-b024aadef2a1493c
Loading Dataset Infos from /home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
Loading Dataset info from /home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
Found cached dataset json (/home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Loading Dataset info from /home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
[INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file tokenizer.json
[WARNING|logging.py:314] 2024-01-04 09:53:51,743 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:737] 2024-01-04 09:53:51,744 >> loading configuration file cognitivecomputations/dolphin-2_6-phi-2/config.json
[INFO|configuration_utils.py:737] 2024-01-04 09:53:51,749 >> loading configuration file cognitivecomputations/dolphin-2_6-phi-2/config.json
[INFO|configuration_utils.py:802] 2024-01-04 09:53:51,750 >> Model config PhiConfig {
"_name_or_path": "cognitivecomputations/dolphin-2_6-phi-2",
"activation_function": "gelu_new",
"architectures": [
"PhiForCausalLM"
],
"attn_pdrop": 0.0,
"auto_map": {
"AutoConfig": "configuration_phi.PhiConfig",
"AutoModelForCausalLM": "modeling_phi.PhiForCausalLM"
},
"embd_pdrop": 0.0,
"flash_attn": false,
"flash_rotary": false,
"fused_dense": false,
"img_processor": null,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "phi-msft",
"n_embd": 2560,
"n_head": 32,
"n_head_kv": null,
"n_inner": null,
"n_layer": 32,
"n_positions": 2048,
"resid_pdrop": 0.1,
"rotary_dim": 32,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.36.2",
"use_cache": false,
"vocab_size": 51200
}
01/04/2024 09:53:51 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
01/04/2024 09:53:51 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit.
[INFO|modeling_utils.py:2907] 2024-01-04 09:53:51,820 >> Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning.
[INFO|modeling_utils.py:3341] 2024-01-04 09:53:51,820 >> loading weights file cognitivecomputations/dolphin-2_6-phi-2/model.safetensors.index.json
[INFO|modeling_utils.py:1341] 2024-01-04 09:53:51,821 >> Instantiating PhiForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:826] 2024-01-04 09:53:51,821 >> Generate config GenerationConfig {
"use_cache": false
}
[INFO|configuration_utils.py:826] 2024-01-04 09:53:51,822 >> Generate config GenerationConfig {
"use_cache": false
}
[INFO|modeling_utils.py:3483] 2024-01-04 09:53:51,875 >> Detected 4-bit loading: activating 4-bit loading for this model
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]cognitivecomputations/dolphin-2_6-phi-2
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.28s/it]
Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:01<00:01, 1.27s/it]cognitivecomputations/dolphin-2_6-phi-2
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.46it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.29it/s]
[WARNING|modeling_utils.py:4175] 2024-01cognitivecomputations/dolphin-2_6-phi-2eights of the model checkpoint at ./models/dolphin-2_6-phi-2 were not used when initializing PhiForCausalLM: ['lm_head.linear.lora_B.default.weight', 'lm_head.linear.lora_A.default.weight']
- This IS expected if you are initializing PhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing PhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[INFO|modeling_utils.py:4193] 2024-01-04 09:53:53,730 >> All the weights of PhiForCausalcognitivecomputations/dolphin-2_6-phi-2he model checkpoint at ./models/dolphin-2_6-phi-2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use PhiForCausalLM for predictions without further training.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.47it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.30it/s]
Some weights of the model checkpoint at ./models/dolphin-2_6-phi-2 were not used when initializing PhiForCausalLM: ['lm_head.linear.lora_B.default.weight', 'lm_head.linear.lora_A.default.weight']
- This IS expected if you are initializing PhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing PhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[INFO|configuration_utils.py:779] 2024-01-04 09:53:53,733 >> loading configuration file ./models/dolphin-2_6-phi-2/generation_config.json
[INFO|configuration_utils.py:826] 2024-01-04 09:53:53,733 >> Generate config GenerationConfig {}
[WARNING|modeling_utils.py:2045] 2024-01-04 09:53:53,816 >> You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
01/04/2024 09:53:53 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
01/04/2024 09:53:53 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
01/04/2024 09:53:53 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
01/04/2024 09:53:53 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
01/04/2024 09:53:53 - INFO - llmtuner.model.loader - trainable params: 2621440 || all params: 2782305280 || trainable%: 0.0942
01/04/2024 09:53:53 - INFO - llmtuner.model.loader - trainable params: 2621440 || all params: 2782305280 || trainable%: 0.0942
Running tokenizer on dataset: 0%| | 0/3347 [00:00<?, ? examples/s][WARNING|tokenization_utils_base.py:3835] 2024-01-04 09:53:55,217 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2217 > 2048). Running this sequence through the model will result in indexing errors
Caching processed dataset at /home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-c64b6c6785bc1929.arrow
Running tokenizer on dataset: 30%|β–ˆβ–ˆβ–‰ | 1000/3347 [00:02<00:06, 372.68 examples/s]
Running tokenizer on dataset: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2000/3347 [00:05<00:03, 387.09 examples/s]
Running tokenizer on dataset: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3000/3347 [00:07<00:00, 395.52 examples/s]
Running tokenizer on dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3347/3347 [00:08<00:00, 396.84 examples/s]
Running tokenizer on dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3347/3347 [00:08<00:00, 392.48 examples/s]
input_ids:
[32, 8537, 1022, 257, 11040, 2836, 290, 281, 11666, 4430, 8796, 13, 383, 8796, 3607, 7613, 11, 6496, 11, 290, 23507, 7429, 284, 262, 2836, 338, 2683, 13, 198, 20490, 25, 36230, 25, 921, 389, 257, 7613, 8796, 351, 1895, 284, 262, 1708, 5499, 13, 5765, 606, 611, 2672, 532, 198, 90, 198, 50284, 1, 3672, 1298, 366, 1136, 62, 1069, 3803, 62, 4873, 1600, 198, 50284, 1, 11213, 1298, 366, 3855, 262, 5163, 2494, 1022, 734, 19247, 1600, 198, 50284, 1, 17143, 7307, 1298, 1391, 198, 50280, 1, 4906, 1298, 366, 15252, 1600, 198, 50280, 1, 48310, 1298, 1391, 198, 50276, 1, 8692, 62, 34415, 1298, 1391, 198, 50272, 1, 4906, 1298, 366, 8841, 1600, 198, 50272, 1, 11213, 1298, 366, 464, 7395, 284, 10385, 422, 1, 198, 50276, 5512, 198, 50276, 1, 16793, 62, 34415, 1298, 1391, 198, 50272, 1, 4906, 1298, 366, 8841, 1600, 198, 50272, 1, 11213, 1298, 366, 464, 7395, 284, 10385, 284, 1, 198, 50276, 92, 198, 50280, 5512, 198, 50280, 1, 35827, 1298, 685, 198, 50276, 1, 8692, 62, 34415, 1600, 198, 50276, 1, 16793, 62, 34415, 1, 198, 50280, 60, 198, 50284, 92, 198, 92, 198, 198, 6090, 345, 1492, 257, 5474, 329, 502, 422, 968, 1971, 284, 3576, 30, 198, 48902, 25, 40, 1101, 7926, 11, 475, 314, 836, 470, 423, 262, 12971, 284, 1492, 13956, 13, 2011, 1459, 2163, 3578, 502, 284, 651, 262, 5163, 2494, 1022, 734, 19247, 13, 1002, 345, 761, 1037, 351, 326, 11, 1254, 1479, 284, 1265, 0, 50295]
inputs:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: SYSTEM: You are a helpful assistant with access to the following functions. Use them if required -
{
"name": "get_exchange_rate",
"description": "Get the exchange rate between two currencies",
"parameters": {
"type": "object",
"properties": {
"base_currency": {
"type": "string",
"description": "The currency to convert from"
},
"target_currency": {
"type": "string",
"description": "The currency to convert to"
}
},
"required": [
"base_currency",
"target_currency"
]
}
}
Can you book a flight for me from New York to London?
Assistant:I'm sorry, but I don't have the capability to book flights. My current function allows me to get the exchange rate between two currencies. If you need help with that, feel free to ask!<|im_end|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 40, 1101, 7926, 11, 475, 314, 836, 470, 423, 262, 12971, 284, 1492, 13956, 13, 2011, 1459, 2163, 3578, 502, 284, 651, 262, 5163, 2494, 1022, 734, 19247, 13, 1002, 345, 761, 1037, 351, 326, 11, 1254, 1479, 284, 1265, 0, 50295]
labels:
I'm sorry, but I don't have the capability to book flights. My current function allows me to get the exchange rate between two currencies. If you need help with that, feel free to ask!<|im_end|>
[INFO|training_args.py:1838] 2024-01-04 09:54:03,936 >> PyTorch: setting up devices
Caching indices mapping at /home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-2d738e000d25696c.arrow
Caching indices mapping at /home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-fe95a5c264c6067e.arrow
Running tokenizer on dataset: 0%| | 0/3347 [00:00<?, ? examples/s]Token indices sequence length is longer than the specified maximum sequence length for this model (2217 > 2048). Running this sequence through the model will result in indexing errors
Running tokenizer on dataset: 30%|β–ˆβ–ˆβ–‰ | 1000/3347 [00:02<00:06, 375.58 examples/s]
Running tokenizer on dataset: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2000/3347 [00:05<00:03, 389.75 examples/s]
Running tokenizer on dataset: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3000/3347 [00:07<00:00, 396.16 examples/s]
Running tokenizer on dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3347/3347 [00:08<00:00, 395.57 examples/s]
Running tokenizer on dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3347/3347 [00:08<00:00, 392.61 examples/s]
[INFO|trainer.py:1706] 2024-01-04 09:54:13,452 >> ***** Running training *****
[INFO|trainer.py:1707] 2024-01-04 09:54:13,452 >> Num examples = 3,011
[INFO|trainer.py:1708] 2024-01-04 09:54:13,452 >> Num Epochs = 1
[INFO|trainer.py:1709] 2024-01-04 09:54:13,452 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1712] 2024-01-04 09:54:13,452 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:1713] 2024-01-04 09:54:13,452 >> Gradient Accumulation steps = 4
[INFO|trainer.py:1714] 2024-01-04 09:54:13,452 >> Total optimization steps = 376
[INFO|trainer.py:1715] 2024-01-04 09:54:13,454 >> Number of trainable parameters = 2,621,440
0%| | 0/376 [00:00<?, ?it/s]
0%| | 1/376 [00:02<13:10, 2.11s/it]
1%| | 2/376 [00:03<09:45, 1.56s/it]
1%| | 3/376 [00:04<09:09, 1.47s/it]
1%| | 4/376 [00:06<08:49, 1.42s/it]
1%|▏ | 5/376 [00:07<08:34, 1.39s/it]
2%|▏ | 6/376 [00:09<09:48, 1.59s/it]
2%|▏ | 7/376 [00:10<09:20, 1.52s/it]
2%|▏ | 8/376 [00:11<08:46, 1.43s/it]
2%|▏ | 9/376 [00:13<08:32, 1.40s/it]
3%|β–Ž | 10/376 [00:14<08:10, 1.34s/it]
{'loss': 1.0017, 'learning_rate': 4.991278696516879e-05, 'epoch': 0.03}
3%|β–Ž | 10/376 [00:14<08:10, 1.34s/it]
3%|β–Ž | 11/376 [00:15<08:11, 1.35s/it]
3%|β–Ž | 12/376 [00:17<08:19, 1.37s/it]
3%|β–Ž | 13/376 [00:18<08:30, 1.41s/it]
4%|β–Ž | 14/376 [00:20<08:30, 1.41s/it]
4%|▍ | 15/376 [00:21<08:03, 1.34s/it]
4%|▍ | 16/376 [00:22<08:26, 1.41s/it]
5%|▍ | 17/376 [00:24<07:54, 1.32s/it]
5%|▍ | 18/376 [00:25<07:42, 1.29s/it]
5%|β–Œ | 19/376 [00:27<08:59, 1.51s/it]
5%|β–Œ | 20/376 [00:28<08:13, 1.39s/it]
{'loss': 0.881, 'learning_rate': 4.9651756349750716e-05, 'epoch': 0.05}
5%|β–Œ | 20/376 [00:28<08:13, 1.39s/it]
6%|β–Œ | 21/376 [00:29<08:05, 1.37s/it]
6%|β–Œ | 22/376 [00:31<09:07, 1.55s/it]
6%|β–Œ | 23/376 [00:32<08:33, 1.46s/it]
6%|β–‹ | 24/376 [00:34<08:40, 1.48s/it]
7%|β–‹ | 25/376 [00:35<08:38, 1.48s/it]
7%|β–‹ | 26/376 [00:37<08:25, 1.44s/it]
7%|β–‹ | 27/376 [00:38<08:02, 1.38s/it]
7%|β–‹ | 28/376 [00:40<08:37, 1.49s/it]
8%|β–Š | 29/376 [00:41<08:31, 1.47s/it]
8%|β–Š | 30/376 [00:43<08:23, 1.46s/it]
{'loss': 0.7979, 'learning_rate': 4.9218729375518135e-05, 'epoch': 0.08}
8%|β–Š | 30/376 [00:43<08:23, 1.46s/it]
8%|β–Š | 31/376 [00:44<08:11, 1.43s/it]
9%|β–Š | 32/376 [00:45<08:12, 1.43s/it]
9%|β–‰ | 33/376 [00:47<08:13, 1.44s/it]
9%|β–‰ | 34/376 [00:48<08:23, 1.47s/it]
9%|β–‰ | 35/376 [00:50<08:44, 1.54s/it]
10%|β–‰ | 36/376 [00:51<08:28, 1.50s/it]
10%|β–‰ | 37/376 [00:53<08:19, 1.47s/it]
10%|β–ˆ | 38/376 [00:54<08:00, 1.42s/it]
10%|β–ˆ | 39/376 [00:56<08:26, 1.50s/it]
11%|β–ˆ | 40/376 [00:57<08:09, 1.46s/it]
{'loss': 0.7022, 'learning_rate': 4.861672729019797e-05, 'epoch': 0.11}
11%|β–ˆ | 40/376 [00:57<08:09, 1.46s/it]
11%|β–ˆ | 41/376 [00:59<07:51, 1.41s/it]
11%|β–ˆ | 42/376 [01:00<07:44, 1.39s/it]
11%|β–ˆβ– | 43/376 [01:01<07:24, 1.34s/it]
12%|β–ˆβ– | 44/376 [01:03<08:38, 1.56s/it]
12%|β–ˆβ– | 45/376 [01:04<08:07, 1.47s/it]
12%|β–ˆβ– | 46/376 [01:06<08:11, 1.49s/it]
12%|β–ˆβ–Ž | 47/376 [01:07<07:35, 1.38s/it]
13%|β–ˆβ–Ž | 48/376 [01:08<07:08, 1.31s/it]
13%|β–ˆβ–Ž | 49/376 [01:09<07:01, 1.29s/it]
13%|β–ˆβ–Ž | 50/376 [01:11<07:07, 1.31s/it]
{'loss': 0.5844, 'learning_rate': 4.784995028809707e-05, 'epoch': 0.13}
13%|β–ˆβ–Ž | 50/376 [01:11<07:07, 1.31s/it]
14%|β–ˆβ–Ž | 51/376 [01:12<06:59, 1.29s/it]
14%|β–ˆβ– | 52/376 [01:13<06:54, 1.28s/it]
14%|β–ˆβ– | 53/376 [01:14<06:26, 1.20s/it]
14%|β–ˆβ– | 54/376 [01:16<06:46, 1.26s/it]
15%|β–ˆβ– | 55/376 [01:17<06:54, 1.29s/it]
15%|β–ˆβ– | 56/376 [01:18<06:21, 1.19s/it]
15%|β–ˆβ–Œ | 57/376 [01:19<06:25, 1.21s/it]
15%|β–ˆβ–Œ | 58/376 [01:21<07:51, 1.48s/it]
16%|β–ˆβ–Œ | 59/376 [01:23<07:14, 1.37s/it]
16%|β–ˆβ–Œ | 60/376 [01:24<07:11, 1.36s/it]
{'loss': 0.4454, 'learning_rate': 4.692374820516679e-05, 'epoch': 0.16}
16%|β–ˆβ–Œ | 60/376 [01:24<07:11, 1.36s/it]
16%|β–ˆβ–Œ | 61/376 [01:26<07:36, 1.45s/it]
16%|β–ˆβ–‹ | 62/376 [01:27<07:34, 1.45s/it]
17%|β–ˆβ–‹ | 63/376 [01:29<08:09, 1.56s/it]
17%|β–ˆβ–‹ | 64/376 [01:30<07:41, 1.48s/it]
17%|β–ˆβ–‹ | 65/376 [01:31<07:23, 1.43s/it]
18%|β–ˆβ–Š | 66/376 [01:33<08:05, 1.57s/it]
18%|β–ˆβ–Š | 67/376 [01:35<07:37, 1.48s/it]
18%|β–ˆβ–Š | 68/376 [01:36<07:15, 1.42s/it]
18%|β–ˆβ–Š | 69/376 [01:37<06:37, 1.29s/it]
19%|β–ˆβ–Š | 70/376 [01:38<06:52, 1.35s/it]
{'loss': 0.4076, 'learning_rate': 4.584458319296868e-05, 'epoch': 0.19}
19%|β–ˆβ–Š | 70/376 [01:38<06:52, 1.35s/it]
19%|β–ˆβ–‰ | 71/376 [01:40<06:42, 1.32s/it]
19%|β–ˆβ–‰ | 72/376 [01:41<06:59, 1.38s/it]
19%|β–ˆβ–‰ | 73/376 [01:42<06:56, 1.37s/it]
20%|β–ˆβ–‰ | 74/376 [01:44<06:24, 1.27s/it]
20%|β–ˆβ–‰ | 75/376 [01:45<06:42, 1.34s/it]
20%|β–ˆβ–ˆ | 76/376 [01:46<06:05, 1.22s/it]
20%|β–ˆβ–ˆ | 77/376 [01:48<06:59, 1.40s/it]
21%|β–ˆβ–ˆ | 78/376 [01:49<07:24, 1.49s/it]
21%|β–ˆβ–ˆ | 79/376 [01:51<06:56, 1.40s/it]
21%|β–ˆβ–ˆβ– | 80/376 [01:52<06:48, 1.38s/it]
{'loss': 0.4111, 'learning_rate': 4.4619984631966524e-05, 'epoch': 0.21}
21%|β–ˆβ–ˆβ– | 80/376 [01:52<06:48, 1.38s/it]
22%|β–ˆβ–ˆβ– | 81/376 [01:54<07:02, 1.43s/it]
22%|β–ˆβ–ˆβ– | 82/376 [01:55<06:47, 1.39s/it]
22%|β–ˆβ–ˆβ– | 83/376 [01:56<06:31, 1.34s/it]
22%|β–ˆβ–ˆβ– | 84/376 [01:58<07:21, 1.51s/it]
23%|β–ˆβ–ˆβ–Ž | 85/376 [01:59<06:58, 1.44s/it]
23%|β–ˆβ–ˆβ–Ž | 86/376 [02:01<07:04, 1.47s/it]
23%|β–ˆβ–ˆβ–Ž | 87/376 [02:02<06:35, 1.37s/it]
23%|β–ˆβ–ˆβ–Ž | 88/376 [02:03<06:43, 1.40s/it]
24%|β–ˆβ–ˆβ–Ž | 89/376 [02:05<06:29, 1.36s/it]
24%|β–ˆβ–ˆβ– | 90/376 [02:07<07:18, 1.53s/it]
{'loss': 0.4115, 'learning_rate': 4.3258496598716736e-05, 'epoch': 0.24}
24%|β–ˆβ–ˆβ– | 90/376 [02:07<07:18, 1.53s/it]
24%|β–ˆβ–ˆβ– | 91/376 [02:08<07:15, 1.53s/it]
24%|β–ˆβ–ˆβ– | 92/376 [02:09<06:47, 1.44s/it]
25%|β–ˆβ–ˆβ– | 93/376 [02:10<06:11, 1.31s/it]
25%|β–ˆβ–ˆβ–Œ | 94/376 [02:11<05:35, 1.19s/it]
25%|β–ˆβ–ˆβ–Œ | 95/376 [02:13<05:53, 1.26s/it]
26%|β–ˆβ–ˆβ–Œ | 96/376 [02:14<05:38, 1.21s/it]
26%|β–ˆβ–ˆβ–Œ | 97/376 [02:15<05:32, 1.19s/it]
26%|β–ˆβ–ˆβ–Œ | 98/376 [02:16<05:54, 1.27s/it]
26%|β–ˆβ–ˆβ–‹ | 99/376 [02:18<06:20, 1.37s/it]
27%|β–ˆβ–ˆβ–‹ | 100/376 [02:19<06:01, 1.31s/it]
{'loss': 0.3566, 'learning_rate': 4.176961825348059e-05, 'epoch': 0.27}
27%|β–ˆβ–ˆβ–‹ | 100/376 [02:19<06:01, 1.31s/it]
27%|β–ˆβ–ˆβ–‹ | 101/376 [02:21<06:14, 1.36s/it]
27%|β–ˆβ–ˆβ–‹ | 102/376 [02:22<06:44, 1.48s/it]
27%|β–ˆβ–ˆβ–‹ | 103/376 [02:24<06:32, 1.44s/it]
28%|β–ˆβ–ˆβ–Š | 104/376 [02:25<05:49, 1.28s/it]
28%|β–ˆβ–ˆβ–Š | 105/376 [02:26<06:19, 1.40s/it]
28%|β–ˆβ–ˆβ–Š | 106/376 [02:28<06:09, 1.37s/it]
28%|β–ˆβ–ˆβ–Š | 107/376 [02:29<06:05, 1.36s/it]
29%|β–ˆβ–ˆβ–Š | 108/376 [02:30<05:38, 1.26s/it]
29%|β–ˆβ–ˆβ–‰ | 109/376 [02:32<06:20, 1.43s/it]
29%|β–ˆβ–ˆβ–‰ | 110/376 [02:33<05:56, 1.34s/it]
{'loss': 0.4302, 'learning_rate': 4.016373756417669e-05, 'epoch': 0.29}
29%|β–ˆβ–ˆβ–‰ | 110/376 [02:33<05:56, 1.34s/it]
30%|β–ˆβ–ˆβ–‰ | 111/376 [02:34<05:40, 1.29s/it]
30%|β–ˆβ–ˆβ–‰ | 112/376 [02:36<06:18, 1.44s/it]
30%|β–ˆβ–ˆβ–ˆ | 113/376 [02:38<06:33, 1.50s/it]
30%|β–ˆβ–ˆβ–ˆ | 114/376 [02:39<06:25, 1.47s/it]
31%|β–ˆβ–ˆβ–ˆ | 115/376 [02:40<06:25, 1.48s/it]
31%|β–ˆβ–ˆβ–ˆ | 116/376 [02:42<06:47, 1.57s/it]
31%|β–ˆβ–ˆβ–ˆ | 117/376 [02:44<06:34, 1.52s/it]
31%|β–ˆβ–ˆβ–ˆβ– | 118/376 [02:45<05:55, 1.38s/it]
32%|β–ˆβ–ˆβ–ˆβ– | 119/376 [02:46<05:38, 1.32s/it]
32%|β–ˆβ–ˆβ–ˆβ– | 120/376 [02:47<05:55, 1.39s/it]
{'loss': 0.4271, 'learning_rate': 3.845205882908432e-05, 'epoch': 0.32}
32%|β–ˆβ–ˆβ–ˆβ– | 120/376 [02:47<05:55, 1.39s/it]
32%|β–ˆβ–ˆβ–ˆβ– | 121/376 [02:49<05:40, 1.33s/it]
32%|β–ˆβ–ˆβ–ˆβ– | 122/376 [02:50<05:38, 1.33s/it]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 123/376 [02:51<05:13, 1.24s/it]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 124/376 [02:52<05:25, 1.29s/it]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 125/376 [02:54<05:48, 1.39s/it]
34%|β–ˆβ–ˆβ–ˆβ–Ž | 126/376 [02:55<05:43, 1.37s/it]
34%|β–ˆβ–ˆβ–ˆβ– | 127/376 [02:57<06:29, 1.57s/it]
34%|β–ˆβ–ˆβ–ˆβ– | 128/376 [02:59<06:08, 1.49s/it]
34%|β–ˆβ–ˆβ–ˆβ– | 129/376 [03:00<06:32, 1.59s/it]
35%|β–ˆβ–ˆβ–ˆβ– | 130/376 [03:02<06:31, 1.59s/it]
{'loss': 0.4625, 'learning_rate': 3.6646524503974955e-05, 'epoch': 0.35}
35%|β–ˆβ–ˆβ–ˆβ– | 130/376 [03:02<06:31, 1.59s/it]
35%|β–ˆβ–ˆβ–ˆβ– | 131/376 [03:04<06:39, 1.63s/it]
35%|β–ˆβ–ˆβ–ˆβ–Œ | 132/376 [03:05<06:34, 1.61s/it]
35%|β–ˆβ–ˆβ–ˆβ–Œ | 133/376 [03:06<05:56, 1.47s/it]
36%|β–ˆβ–ˆβ–ˆβ–Œ | 134/376 [03:08<05:40, 1.41s/it]
36%|β–ˆβ–ˆβ–ˆβ–Œ | 135/376 [03:09<05:27, 1.36s/it]
36%|β–ˆβ–ˆβ–ˆβ–Œ | 136/376 [03:11<06:03, 1.52s/it]
36%|β–ˆβ–ˆβ–ˆβ–‹ | 137/376 [03:12<05:38, 1.42s/it]
37%|β–ˆβ–ˆβ–ˆβ–‹ | 138/376 [03:13<05:37, 1.42s/it]
37%|β–ˆβ–ˆβ–ˆβ–‹ | 139/376 [03:15<05:34, 1.41s/it]
37%|β–ˆβ–ˆβ–ˆβ–‹ | 140/376 [03:17<05:54, 1.50s/it]
{'loss': 0.5066, 'learning_rate': 3.475973187908737e-05, 'epoch': 0.37}
37%|β–ˆβ–ˆβ–ˆβ–‹ | 140/376 [03:17<05:54, 1.50s/it]
38%|β–ˆβ–ˆβ–ˆβ–Š | 141/376 [03:18<05:46, 1.47s/it]
38%|β–ˆβ–ˆβ–ˆβ–Š | 142/376 [03:20<05:55, 1.52s/it]
38%|β–ˆβ–ˆβ–ˆβ–Š | 143/376 [03:21<06:18, 1.62s/it]
38%|β–ˆβ–ˆβ–ˆβ–Š | 144/376 [03:23<05:49, 1.51s/it]
39%|β–ˆβ–ˆβ–ˆβ–Š | 145/376 [03:24<05:39, 1.47s/it]
39%|β–ˆβ–ˆβ–ˆβ–‰ | 146/376 [03:25<05:32, 1.44s/it]
39%|β–ˆβ–ˆβ–ˆβ–‰ | 147/376 [03:27<05:25, 1.42s/it]
39%|β–ˆβ–ˆβ–ˆβ–‰ | 148/376 [03:28<05:30, 1.45s/it]
40%|β–ˆβ–ˆβ–ˆβ–‰ | 149/376 [03:30<05:54, 1.56s/it]
40%|β–ˆβ–ˆβ–ˆβ–‰ | 150/376 [03:32<05:37, 1.49s/it]
{'loss': 0.3887, 'learning_rate': 3.280484518729466e-05, 'epoch': 0.4}
40%|β–ˆβ–ˆβ–ˆβ–‰ | 150/376 [03:32<05:37, 1.49s/it]
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 151/376 [03:33<05:14, 1.40s/it]
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 152/376 [03:34<04:52, 1.30s/it]
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 153/376 [03:35<04:54, 1.32s/it]
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 154/376 [03:37<04:56, 1.34s/it]
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 155/376 [03:38<05:14, 1.42s/it]
41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 156/376 [03:40<05:08, 1.40s/it]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 157/376 [03:41<04:45, 1.30s/it]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 158/376 [03:42<04:29, 1.24s/it]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 159/376 [03:43<04:41, 1.30s/it]
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 160/376 [03:44<04:41, 1.30s/it]
{'loss': 0.3675, 'learning_rate': 3.079550375668821e-05, 'epoch': 0.42}
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 160/376 [03:44<04:41, 1.30s/it]
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 161/376 [03:46<04:59, 1.39s/it]
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 162/376 [03:47<04:35, 1.29s/it]
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 163/376 [03:48<04:36, 1.30s/it]
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 164/376 [03:50<04:34, 1.30s/it]
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 165/376 [03:51<04:41, 1.33s/it]
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 166/376 [03:53<05:04, 1.45s/it]
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 167/376 [03:54<05:04, 1.46s/it]
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 168/376 [03:56<05:02, 1.45s/it]
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 169/376 [03:57<05:08, 1.49s/it]
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 170/376 [03:59<05:00, 1.46s/it]
{'loss': 0.4095, 'learning_rate': 2.8745726848402036e-05, 'epoch': 0.45}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 170/376 [03:59<05:00, 1.46s/it]
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 171/376 [04:00<04:57, 1.45s/it]
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 172/376 [04:01<04:26, 1.31s/it]
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 173/376 [04:02<04:12, 1.24s/it]
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 174/376 [04:04<04:28, 1.33s/it]
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 175/376 [04:05<04:35, 1.37s/it]
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 176/376 [04:07<04:50, 1.45s/it]
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 177/376 [04:09<05:17, 1.60s/it]
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 178/376 [04:10<04:52, 1.48s/it]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 179/376 [04:11<04:47, 1.46s/it]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 180/376 [04:13<04:29, 1.37s/it]
{'loss': 0.3782, 'learning_rate': 2.6669815843628042e-05, 'epoch': 0.48}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 180/376 [04:13<04:29, 1.37s/it]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 181/376 [04:14<04:25, 1.36s/it]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 182/376 [04:15<04:18, 1.33s/it]
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 183/376 [04:17<04:42, 1.46s/it]
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 184/376 [04:18<04:11, 1.31s/it]
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 185/376 [04:19<04:02, 1.27s/it]
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 186/376 [04:21<04:45, 1.50s/it]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 187/376 [04:22<04:26, 1.41s/it]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 188/376 [04:24<04:14, 1.35s/it]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 189/376 [04:26<05:00, 1.61s/it]
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 190/376 [04:28<05:18, 1.71s/it]
{'loss': 0.4195, 'learning_rate': 2.4582254462267476e-05, 'epoch': 0.5}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 190/376 [04:28<05:18, 1.71s/it]
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 191/376 [04:29<04:55, 1.60s/it]
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 192/376 [04:30<04:48, 1.57s/it]
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 193/376 [04:32<04:46, 1.56s/it]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 194/376 [04:33<04:34, 1.51s/it]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 195/376 [04:36<05:04, 1.68s/it]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 196/376 [04:37<04:39, 1.55s/it]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 197/376 [04:38<04:32, 1.52s/it]
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 198/376 [04:40<04:39, 1.57s/it]
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 199/376 [04:42<04:42, 1.59s/it]
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 200/376 [04:43<04:25, 1.51s/it]
{'loss': 0.3392, 'learning_rate': 2.2497607709397543e-05, 'epoch': 0.53}
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 200/376 [04:43<04:25, 1.51s/it]
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 201/376 [04:44<04:13, 1.45s/it]
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 202/376 [04:45<03:58, 1.37s/it]
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 203/376 [04:47<03:53, 1.35s/it]
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 204/376 [04:48<03:38, 1.27s/it]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 205/376 [04:49<03:46, 1.33s/it]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 206/376 [04:51<03:55, 1.38s/it]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 207/376 [04:52<04:11, 1.49s/it]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 208/376 [04:54<04:02, 1.44s/it]
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 209/376 [04:55<03:40, 1.32s/it]
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 210/376 [04:56<03:21, 1.21s/it]
{'loss': 0.3347, 'learning_rate': 2.0430420254607748e-05, 'epoch': 0.56}
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 210/376 [04:56<03:21, 1.21s/it]
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 211/376 [04:57<03:18, 1.20s/it]
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 212/376 [04:59<03:42, 1.36s/it]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 213/376 [05:00<03:41, 1.36s/it]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 214/376 [05:01<03:44, 1.39s/it]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 215/376 [05:03<03:28, 1.30s/it]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 216/376 [05:04<03:34, 1.34s/it]
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 217/376 [05:05<03:38, 1.37s/it]
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 218/376 [05:07<03:38, 1.38s/it]
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 219/376 [05:08<03:30, 1.34s/it]
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 220/376 [05:09<03:21, 1.29s/it]
{'loss': 0.4117, 'learning_rate': 1.8395114953217852e-05, 'epoch': 0.58}
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 220/376 [05:09<03:21, 1.29s/it]
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 221/376 [05:11<03:19, 1.29s/it]
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 222/376 [05:12<03:20, 1.30s/it]
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 223/376 [05:13<03:03, 1.20s/it]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 224/376 [05:14<02:50, 1.12s/it]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 225/376 [05:15<02:47, 1.11s/it]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 226/376 [05:16<03:02, 1.21s/it]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 227/376 [05:18<03:26, 1.39s/it]
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 228/376 [05:20<03:41, 1.50s/it]
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 229/376 [05:21<03:29, 1.43s/it]
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 230/376 [05:23<03:37, 1.49s/it]
{'loss': 0.3772, 'learning_rate': 1.640589221739926e-05, 'epoch': 0.61}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 230/376 [05:23<03:37, 1.49s/it]
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 231/376 [05:24<03:26, 1.43s/it]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 232/376 [05:25<03:21, 1.40s/it]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 233/376 [05:27<03:09, 1.33s/it]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 234/376 [05:28<03:33, 1.51s/it]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 235/376 [05:30<03:20, 1.42s/it]
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 236/376 [05:31<03:32, 1.52s/it]
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 237/376 [05:33<03:17, 1.42s/it]
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 238/376 [05:34<03:00, 1.31s/it]
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 239/376 [05:35<03:00, 1.32s/it]
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 240/376 [05:36<03:04, 1.35s/it]
{'loss': 0.4403, 'learning_rate': 1.447663093929163e-05, 'epoch': 0.64}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 240/376 [05:36<03:04, 1.35s/it]
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 241/376 [05:38<03:02, 1.35s/it]
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 242/376 [05:39<02:59, 1.34s/it]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 243/376 [05:41<03:03, 1.38s/it]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 244/376 [05:42<03:05, 1.41s/it]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 245/376 [05:43<03:00, 1.38s/it]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 246/376 [05:45<02:53, 1.33s/it]
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 247/376 [05:46<02:41, 1.25s/it]
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 248/376 [05:47<02:51, 1.34s/it]
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 249/376 [05:48<02:46, 1.31s/it]
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 250/376 [05:50<03:02, 1.45s/it]
{'loss': 0.3867, 'learning_rate': 1.2620791657378664e-05, 'epoch': 0.66}
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 250/376 [05:50<03:02, 1.45s/it]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 251/376 [05:51<02:51, 1.37s/it]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 252/376 [05:53<02:52, 1.39s/it]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 253/376 [05:54<02:50, 1.39s/it]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 254/376 [05:56<02:44, 1.35s/it]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 255/376 [05:57<02:54, 1.44s/it]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 256/376 [05:59<02:54, 1.45s/it]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 257/376 [06:00<02:47, 1.41s/it]
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 258/376 [06:01<02:35, 1.31s/it]
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 259/376 [06:03<02:42, 1.39s/it]
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 260/376 [06:04<02:28, 1.28s/it]
{'loss': 0.3688, 'learning_rate': 1.0851322641735118e-05, 'epoch': 0.69}
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 260/376 [06:04<02:28, 1.28s/it]
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 261/376 [06:05<02:29, 1.30s/it]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 262/376 [06:07<02:51, 1.50s/it]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 263/376 [06:09<02:53, 1.54s/it]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 264/376 [06:10<02:53, 1.55s/it]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 265/376 [06:11<02:43, 1.47s/it]
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 266/376 [06:13<02:32, 1.38s/it]
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 267/376 [06:14<02:40, 1.47s/it]
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 268/376 [06:16<02:35, 1.44s/it]
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 269/376 [06:17<02:44, 1.54s/it]
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 270/376 [06:19<02:40, 1.51s/it]
{'loss': 0.3655, 'learning_rate': 9.180569553392535e-06, 'epoch': 0.72}
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 270/376 [06:19<02:40, 1.51s/it]
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 271/376 [06:20<02:34, 1.47s/it]
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 272/376 [06:21<02:25, 1.40s/it]
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 273/376 [06:23<02:24, 1.40s/it]
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 274/376 [06:25<02:29, 1.47s/it]
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 275/376 [06:27<02:45, 1.64s/it]
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 276/376 [06:28<02:32, 1.52s/it]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 277/376 [06:29<02:27, 1.49s/it]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 278/376 [06:31<02:29, 1.53s/it]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 279/376 [06:32<02:21, 1.46s/it]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 280/376 [06:33<02:15, 1.41s/it]
{'loss': 0.4144, 'learning_rate': 7.620189308133943e-06, 'epoch': 0.74}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 280/376 [06:33<02:15, 1.41s/it]
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 281/376 [06:35<02:14, 1.41s/it]
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 282/376 [06:36<02:17, 1.46s/it]
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 283/376 [06:38<02:16, 1.47s/it]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 284/376 [06:39<02:03, 1.34s/it]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 285/376 [06:40<01:53, 1.25s/it]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 286/376 [06:41<01:49, 1.21s/it]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 287/376 [06:43<02:11, 1.48s/it]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 288/376 [06:45<02:24, 1.64s/it]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 289/376 [06:46<02:06, 1.45s/it]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 290/376 [06:47<01:56, 1.35s/it]
{'loss': 0.3298, 'learning_rate': 6.181068745693716e-06, 'epoch': 0.77}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 290/376 [06:47<01:56, 1.35s/it]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 291/376 [06:48<01:47, 1.27s/it]
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 292/376 [06:50<01:51, 1.33s/it]
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 293/376 [06:51<01:42, 1.24s/it]
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 294/376 [06:52<01:46, 1.29s/it]
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 295/376 [06:54<01:50, 1.37s/it]
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 296/376 [06:55<01:46, 1.33s/it]
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 297/376 [06:56<01:37, 1.24s/it]
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 298/376 [06:58<01:39, 1.28s/it]
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 299/376 [06:59<01:34, 1.23s/it]
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 300/376 [07:00<01:31, 1.20s/it]
{'loss': 0.3337, 'learning_rate': 4.873248671810928e-06, 'epoch': 0.8}
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 300/376 [07:00<01:31, 1.20s/it]
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 301/376 [07:01<01:28, 1.17s/it]
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 302/376 [07:03<01:36, 1.31s/it]
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 303/376 [07:04<01:30, 1.23s/it]
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 304/376 [07:05<01:26, 1.20s/it]
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 305/376 [07:06<01:23, 1.18s/it]
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 306/376 [07:07<01:25, 1.22s/it]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 307/376 [07:08<01:20, 1.17s/it]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 308/376 [07:10<01:22, 1.22s/it]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 309/376 [07:11<01:22, 1.24s/it]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 310/376 [07:13<01:35, 1.44s/it]
{'loss': 0.3217, 'learning_rate': 3.7058538030980942e-06, 'epoch': 0.82}
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 310/376 [07:13<01:35, 1.44s/it]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 311/376 [07:14<01:31, 1.41s/it]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 312/376 [07:15<01:22, 1.29s/it]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 313/376 [07:16<01:23, 1.33s/it]
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 314/376 [07:18<01:28, 1.43s/it]
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 315/376 [07:19<01:25, 1.40s/it]
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 316/376 [07:21<01:17, 1.29s/it]
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 317/376 [07:22<01:11, 1.21s/it]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 318/376 [07:24<01:24, 1.46s/it]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 319/376 [07:25<01:19, 1.40s/it]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 320/376 [07:26<01:16, 1.36s/it]
{'loss': 0.3222, 'learning_rate': 2.687029103502972e-06, 'epoch': 0.85}
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 320/376 [07:26<01:16, 1.36s/it]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 321/376 [07:27<01:09, 1.26s/it]
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 322/376 [07:28<01:09, 1.28s/it]
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 323/376 [07:30<01:12, 1.38s/it]
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 324/376 [07:32<01:12, 1.40s/it]
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 325/376 [07:33<01:11, 1.41s/it]
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 326/376 [07:35<01:15, 1.50s/it]
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 327/376 [07:36<01:12, 1.49s/it]
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 328/376 [07:38<01:12, 1.50s/it]
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 329/376 [07:39<01:10, 1.50s/it]
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 330/376 [07:41<01:10, 1.53s/it]
{'loss': 0.3989, 'learning_rate': 1.823882956546566e-06, 'epoch': 0.88}
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 330/376 [07:41<01:10, 1.53s/it]
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 331/376 [07:42<01:10, 1.57s/it]
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 332/376 [07:44<01:09, 1.59s/it]
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 333/376 [07:45<01:06, 1.54s/it]
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 334/376 [07:47<01:02, 1.48s/it]
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 335/376 [07:48<01:00, 1.48s/it]
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 336/376 [07:49<00:53, 1.33s/it]
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 337/376 [07:50<00:50, 1.29s/it]
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 338/376 [07:52<00:48, 1.29s/it]
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 339/376 [07:53<00:49, 1.34s/it]
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 340/376 [07:55<00:48, 1.34s/it]
{'loss': 0.3805, 'learning_rate': 1.1224375698271894e-06, 'epoch': 0.9}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 340/376 [07:55<00:48, 1.34s/it]
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 341/376 [07:56<00:48, 1.38s/it]
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 342/376 [07:57<00:43, 1.28s/it]
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 343/376 [07:59<00:47, 1.45s/it]
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 344/376 [08:00<00:47, 1.49s/it]
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 345/376 [08:02<00:45, 1.46s/it]
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 346/376 [08:03<00:40, 1.34s/it]
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 347/376 [08:04<00:40, 1.39s/it]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 348/376 [08:06<00:36, 1.30s/it]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 349/376 [08:07<00:34, 1.28s/it]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 350/376 [08:08<00:34, 1.31s/it]
{'loss': 0.4108, 'learning_rate': 5.875869578203824e-07, 'epoch': 0.93}
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 350/376 [08:08<00:34, 1.31s/it]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 351/376 [08:10<00:34, 1.38s/it]
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 352/376 [08:11<00:31, 1.32s/it]
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 353/376 [08:12<00:28, 1.26s/it]
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 354/376 [08:13<00:28, 1.29s/it]
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 355/376 [08:14<00:25, 1.23s/it]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 356/376 [08:16<00:23, 1.19s/it]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 357/376 [08:17<00:25, 1.35s/it]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 358/376 [08:19<00:26, 1.47s/it]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 359/376 [08:20<00:22, 1.32s/it]
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 360/376 [08:21<00:21, 1.37s/it]
{'loss': 0.3578, 'learning_rate': 2.230627961304993e-07, 'epoch': 0.96}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 360/376 [08:22<00:21, 1.37s/it]
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 361/376 [08:23<00:20, 1.34s/it]
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 362/376 [08:24<00:19, 1.39s/it]
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 363/376 [08:26<00:18, 1.40s/it]
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 364/376 [08:27<00:16, 1.36s/it]
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 365/376 [08:29<00:16, 1.46s/it]
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 366/376 [08:30<00:14, 1.45s/it]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 367/376 [08:32<00:13, 1.46s/it]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 368/376 [08:33<00:11, 1.46s/it]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 369/376 [08:35<00:10, 1.48s/it]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 370/376 [08:36<00:08, 1.39s/it]
{'loss': 0.3453, 'learning_rate': 3.1408385430356516e-08, 'epoch': 0.98}
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 370/376 [08:36<00:08, 1.39s/it]
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 371/376 [08:37<00:06, 1.31s/it]
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 372/376 [08:38<00:05, 1.35s/it]
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 373/376 [08:40<00:04, 1.35s/it]
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 374/376 [08:41<00:02, 1.41s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 375/376 [08:43<00:01, 1.47s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 376/376 [08:44<00:00, 1.47s/it][INFO|trainer.py:3166] 2024-01-04 10:02:58,683 >> ***** Running Evaluation *****
[INFO|trainer.py:3168] 2024-01-04 10:02:58,683 >> Num examples = 335
[INFO|trainer.py:3171] 2024-01-04 10:02:58,683 >> Batch size = 1
0%| | 0/168 [00:00<?, ?it/s]
1%| | 2/168 [00:00<00:16, 10.08it/s]
2%|▏ | 4/168 [00:00<00:21, 7.51it/s]
3%|β–Ž | 5/168 [00:00<00:27, 5.84it/s]
4%|β–Ž | 6/168 [00:00<00:24, 6.53it/s]
4%|▍ | 7/168 [00:01<00:28, 5.68it/s]
5%|▍ | 8/168 [00:01<00:27, 5.80it/s]
5%|β–Œ | 9/168 [00:01<00:31, 5.02it/s]
6%|β–Œ | 10/168 [00:01<00:33, 4.74it/s]
7%|β–‹ | 11/168 [00:01<00:29, 5.33it/s]
7%|β–‹ | 12/168 [00:02<00:32, 4.81it/s]
8%|β–Š | 13/168 [00:02<00:28, 5.35it/s]
8%|β–Š | 14/168 [00:02<00:26, 5.85it/s]
9%|β–‰ | 15/168 [00:02<00:29, 5.24it/s]
10%|β–‰ | 16/168 [00:02<00:27, 5.51it/s]
10%|β–ˆ | 17/168 [00:02<00:25, 5.91it/s]
11%|β–ˆ | 18/168 [00:03<00:24, 6.22it/s]
11%|β–ˆβ– | 19/168 [00:03<00:23, 6.44it/s]
12%|β–ˆβ– | 20/168 [00:03<00:24, 6.13it/s]
12%|β–ˆβ–Ž | 21/168 [00:03<00:28, 5.17it/s]
13%|β–ˆβ–Ž | 22/168 [00:03<00:26, 5.44it/s]
14%|β–ˆβ– | 24/168 [00:04<00:21, 6.73it/s]
15%|β–ˆβ– | 25/168 [00:04<00:20, 7.15it/s]
15%|β–ˆβ–Œ | 26/168 [00:04<00:20, 6.95it/s]
16%|β–ˆβ–Œ | 27/168 [00:04<00:24, 5.74it/s]
17%|β–ˆβ–‹ | 28/168 [00:04<00:23, 6.08it/s]
17%|β–ˆβ–‹ | 29/168 [00:04<00:21, 6.38it/s]
18%|β–ˆβ–Š | 30/168 [00:05<00:21, 6.48it/s]
18%|β–ˆβ–Š | 31/168 [00:05<00:20, 6.71it/s]
19%|β–ˆβ–‰ | 32/168 [00:05<00:20, 6.76it/s]
20%|β–ˆβ–ˆ | 34/168 [00:05<00:18, 7.39it/s]
21%|β–ˆβ–ˆ | 35/168 [00:05<00:18, 7.31it/s]
21%|β–ˆβ–ˆβ– | 36/168 [00:05<00:17, 7.68it/s]
22%|β–ˆβ–ˆβ– | 37/168 [00:05<00:17, 7.58it/s]
23%|β–ˆβ–ˆβ–Ž | 38/168 [00:06<00:17, 7.44it/s]
23%|β–ˆβ–ˆβ–Ž | 39/168 [00:06<00:20, 6.29it/s]
24%|β–ˆβ–ˆβ– | 40/168 [00:06<00:20, 6.31it/s]
24%|β–ˆβ–ˆβ– | 41/168 [00:06<00:20, 6.31it/s]
25%|β–ˆβ–ˆβ–Œ | 42/168 [00:06<00:18, 6.93it/s]
26%|β–ˆβ–ˆβ–Œ | 43/168 [00:06<00:20, 5.98it/s]
26%|β–ˆβ–ˆβ–Œ | 44/168 [00:07<00:19, 6.44it/s]
27%|β–ˆβ–ˆβ–‹ | 45/168 [00:07<00:21, 5.70it/s]
27%|β–ˆβ–ˆβ–‹ | 46/168 [00:07<00:23, 5.26it/s]
28%|β–ˆβ–ˆβ–Š | 47/168 [00:07<00:25, 4.74it/s]
29%|β–ˆβ–ˆβ–Š | 48/168 [00:07<00:22, 5.29it/s]
29%|β–ˆβ–ˆβ–‰ | 49/168 [00:08<00:21, 5.65it/s]
30%|β–ˆβ–ˆβ–‰ | 50/168 [00:08<00:19, 6.09it/s]
30%|β–ˆβ–ˆβ–ˆ | 51/168 [00:08<00:21, 5.46it/s]
31%|β–ˆβ–ˆβ–ˆ | 52/168 [00:08<00:20, 5.70it/s]
32%|β–ˆβ–ˆβ–ˆβ– | 53/168 [00:08<00:22, 5.02it/s]
32%|β–ˆβ–ˆβ–ˆβ– | 54/168 [00:09<00:22, 5.11it/s]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 55/168 [00:09<00:21, 5.22it/s]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 56/168 [00:09<00:19, 5.75it/s]
34%|β–ˆβ–ˆβ–ˆβ– | 57/168 [00:09<00:17, 6.50it/s]
35%|β–ˆβ–ˆβ–ˆβ– | 58/168 [00:09<00:18, 5.82it/s]
35%|β–ˆβ–ˆβ–ˆβ–Œ | 59/168 [00:09<00:21, 5.01it/s]
36%|β–ˆβ–ˆβ–ˆβ–Œ | 60/168 [00:10<00:19, 5.44it/s]
36%|β–ˆβ–ˆβ–ˆβ–‹ | 61/168 [00:10<00:17, 6.24it/s]
37%|β–ˆβ–ˆβ–ˆβ–‹ | 62/168 [00:10<00:15, 6.80it/s]
38%|β–ˆβ–ˆβ–ˆβ–Š | 63/168 [00:10<00:15, 6.99it/s]
38%|β–ˆβ–ˆβ–ˆβ–Š | 64/168 [00:10<00:14, 7.14it/s]
39%|β–ˆβ–ˆβ–ˆβ–Š | 65/168 [00:10<00:14, 7.02it/s]
39%|β–ˆβ–ˆβ–ˆβ–‰ | 66/168 [00:10<00:13, 7.43it/s]
40%|β–ˆβ–ˆβ–ˆβ–‰ | 67/168 [00:11<00:17, 5.85it/s]
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 68/168 [00:11<00:19, 5.09it/s]
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 69/168 [00:11<00:17, 5.57it/s]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 70/168 [00:11<00:15, 6.14it/s]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 71/168 [00:11<00:14, 6.50it/s]
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 72/168 [00:11<00:16, 5.78it/s]
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 73/168 [00:12<00:19, 5.00it/s]
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 74/168 [00:12<00:19, 4.95it/s]
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 75/168 [00:12<00:18, 5.12it/s]
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 77/168 [00:12<00:13, 6.52it/s]
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 78/168 [00:12<00:12, 7.01it/s]
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 79/168 [00:13<00:12, 7.09it/s]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 80/168 [00:13<00:13, 6.46it/s]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 81/168 [00:13<00:13, 6.39it/s]
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 82/168 [00:13<00:13, 6.33it/s]
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 83/168 [00:13<00:13, 6.39it/s]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 84/168 [00:13<00:14, 5.64it/s]
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 85/168 [00:14<00:16, 5.00it/s]
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 86/168 [00:14<00:14, 5.55it/s]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 87/168 [00:14<00:13, 6.03it/s]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 88/168 [00:14<00:12, 6.65it/s]
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 89/168 [00:14<00:12, 6.47it/s]
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 90/168 [00:14<00:11, 6.53it/s]
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 91/168 [00:15<00:11, 6.55it/s]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 92/168 [00:15<00:10, 7.11it/s]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 93/168 [00:15<00:10, 7.24it/s]
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 94/168 [00:15<00:10, 7.32it/s]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 95/168 [00:15<00:09, 7.37it/s]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 96/168 [00:15<00:12, 5.76it/s]
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 97/168 [00:16<00:13, 5.12it/s]
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 98/168 [00:16<00:14, 4.86it/s]
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 99/168 [00:16<00:12, 5.42it/s]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 100/168 [00:16<00:11, 5.69it/s]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 101/168 [00:16<00:10, 6.38it/s]
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 102/168 [00:16<00:10, 6.58it/s]
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 103/168 [00:16<00:09, 7.16it/s]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 104/168 [00:17<00:11, 5.74it/s]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 105/168 [00:17<00:10, 6.08it/s]
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 106/168 [00:17<00:09, 6.21it/s]
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 107/168 [00:17<00:09, 6.45it/s]
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 108/168 [00:17<00:09, 6.56it/s]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 109/168 [00:17<00:09, 6.31it/s]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 110/168 [00:18<00:08, 6.89it/s]
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 111/168 [00:18<00:08, 6.36it/s]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 112/168 [00:18<00:08, 6.27it/s]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 113/168 [00:18<00:09, 6.00it/s]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 114/168 [00:18<00:08, 6.33it/s]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 115/168 [00:18<00:07, 6.64it/s]
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 116/168 [00:19<00:08, 5.88it/s]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 117/168 [00:19<00:08, 6.22it/s]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 118/168 [00:19<00:08, 6.16it/s]
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 119/168 [00:19<00:08, 5.46it/s]
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 120/168 [00:19<00:09, 4.83it/s]
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 121/168 [00:20<00:09, 5.00it/s]
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 122/168 [00:20<00:09, 4.75it/s]
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 123/168 [00:20<00:08, 5.45it/s]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 124/168 [00:20<00:08, 5.04it/s]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 125/168 [00:20<00:07, 5.50it/s]
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 126/168 [00:20<00:06, 6.18it/s]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 127/168 [00:21<00:06, 6.57it/s]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 128/168 [00:21<00:07, 5.37it/s]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 129/168 [00:21<00:07, 5.42it/s]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 130/168 [00:21<00:07, 4.82it/s]
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 131/168 [00:21<00:06, 5.58it/s]
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 132/168 [00:22<00:06, 5.73it/s]
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 133/168 [00:22<00:06, 5.66it/s]
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 134/168 [00:22<00:05, 6.34it/s]
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 135/168 [00:22<00:05, 6.50it/s]
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 136/168 [00:22<00:04, 7.06it/s]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 137/168 [00:22<00:05, 5.69it/s]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 138/168 [00:23<00:05, 5.86it/s]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 139/168 [00:23<00:04, 6.27it/s]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 140/168 [00:23<00:04, 6.96it/s]
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 141/168 [00:23<00:03, 7.29it/s]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 142/168 [00:23<00:03, 7.25it/s]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 143/168 [00:23<00:04, 5.77it/s]
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 144/168 [00:24<00:04, 5.05it/s]
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 145/168 [00:24<00:04, 5.33it/s]
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 146/168 [00:24<00:03, 5.79it/s]
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 147/168 [00:24<00:03, 6.15it/s]
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 148/168 [00:24<00:02, 6.81it/s]
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 149/168 [00:24<00:03, 5.49it/s]
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 151/168 [00:25<00:03, 5.66it/s]
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 153/168 [00:25<00:02, 5.77it/s]
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 154/168 [00:25<00:02, 6.05it/s]
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 155/168 [00:25<00:02, 5.80it/s]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 156/168 [00:26<00:02, 5.39it/s]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 157/168 [00:26<00:02, 5.36it/s]
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 158/168 [00:26<00:01, 6.07it/s]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 159/168 [00:26<00:01, 6.11it/s]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 160/168 [00:26<00:01, 5.82it/s]
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 161/168 [00:26<00:01, 5.74it/s]
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 162/168 [00:27<00:01, 5.98it/s]
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 163/168 [00:27<00:00, 5.17it/s]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 164/168 [00:27<00:00, 5.61it/s]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 165/168 [00:27<00:00, 6.31it/s]
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 166/168 [00:27<00:00, 6.78it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 168/168 [00:28<00:00, 6.50it/s]
{'eval_loss': 0.35242682695388794, 'eval_runtime': 28.2403, 'eval_samples_per_second': 11.862, 'eval_steps_per_second': 5.949, 'epoch': 1.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 376/376 [09:13<00:00, 1.47s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 168/168 [00:28<00:00, 6.50it/s]
[INFO|trainer.py:1947] 2024-01-04 10:03:26,926 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 553.4721, 'train_samples_per_second': 5.44, 'train_steps_per_second': 0.679, 'train_loss': 0.4441075046011742, 'epoch': 1.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 376/376 [09:13<00:00, 1.47s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 376/376 [09:13<00:00, 1.47s/it]
[INFO|trainer.py:2889] 2024-01-04 10:03:26,930 >> Saving model checkpoint to ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora
[INFO|tokenization_utils_base.py:2432] 2024-01-04 10:03:26,973 >> tokenizer config file saved in ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/tokenizer_config.json
[INFO|tokenization_utils_base.py:2441] 2024-01-04 10:03:26,974 >> Special tokens file saved in ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/special_tokens_map.json
[INFO|tokenization_utils_base.py:2492] 2024-01-04 10:03:26,974 >> added tokens file saved in ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/added_tokens.json
***** train metrics *****
epoch = 1.0
train_loss = 0.4441
train_runtime = 0:09:13.47
train_samples_per_second = 5.44
train_steps_per_second = 0.679
Figure saved: ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/training_loss.png
Figure saved: ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/training_eval_loss.png
[INFO|trainer.py:3166] 2024-01-04 10:03:27,895 >> ***** Running Evaluation *****
[INFO|trainer.py:3168] 2024-01-04 10:03:27,895 >> Num examples = 335
[INFO|trainer.py:3171] 2024-01-04 10:03:27,895 >> Batch size = 1
0%| | 0/168 [00:00<?, ?it/s]
1%| | 2/168 [00:00<00:16, 10.31it/s]
2%|▏ | 4/168 [00:00<00:21, 7.58it/s]
3%|β–Ž | 5/168 [00:00<00:27, 5.85it/s]
4%|β–Ž | 6/168 [00:00<00:24, 6.54it/s]
4%|▍ | 7/168 [00:01<00:28, 5.68it/s]
5%|▍ | 8/168 [00:01<00:27, 5.80it/s]
5%|β–Œ | 9/168 [00:01<00:31, 5.01it/s]
6%|β–Œ | 10/168 [00:01<00:33, 4.73it/s]
7%|β–‹ | 11/168 [00:01<00:29, 5.32it/s]
7%|β–‹ | 12/168 [00:02<00:32, 4.81it/s]
8%|β–Š | 13/168 [00:02<00:29, 5.34it/s]
8%|β–Š | 14/168 [00:02<00:26, 5.85it/s]
9%|β–‰ | 15/168 [00:02<00:29, 5.23it/s]
10%|β–‰ | 16/168 [00:02<00:27, 5.50it/s]
10%|β–ˆ | 17/168 [00:02<00:25, 5.90it/s]
11%|β–ˆ | 18/168 [00:03<00:24, 6.21it/s]
11%|β–ˆβ– | 19/168 [00:03<00:23, 6.45it/s]
12%|β–ˆβ– | 20/168 [00:03<00:24, 6.14it/s]
12%|β–ˆβ–Ž | 21/168 [00:03<00:28, 5.19it/s]
13%|β–ˆβ–Ž | 22/168 [00:03<00:26, 5.46it/s]
14%|β–ˆβ– | 24/168 [00:04<00:21, 6.74it/s]
15%|β–ˆβ– | 25/168 [00:04<00:19, 7.15it/s]
15%|β–ˆβ–Œ | 26/168 [00:04<00:20, 6.95it/s]
16%|β–ˆβ–Œ | 27/168 [00:04<00:24, 5.74it/s]
17%|β–ˆβ–‹ | 28/168 [00:04<00:23, 6.08it/s]
17%|β–ˆβ–‹ | 29/168 [00:04<00:21, 6.38it/s]
18%|β–ˆβ–Š | 30/168 [00:05<00:21, 6.49it/s]
18%|β–ˆβ–Š | 31/168 [00:05<00:20, 6.69it/s]
19%|β–ˆβ–‰ | 32/168 [00:05<00:20, 6.76it/s]
20%|β–ˆβ–ˆ | 34/168 [00:05<00:18, 7.39it/s]
21%|β–ˆβ–ˆ | 35/168 [00:05<00:18, 7.31it/s]
21%|β–ˆβ–ˆβ– | 36/168 [00:05<00:17, 7.68it/s]
22%|β–ˆβ–ˆβ– | 37/168 [00:05<00:17, 7.57it/s]
23%|β–ˆβ–ˆβ–Ž | 38/168 [00:06<00:17, 7.43it/s]
23%|β–ˆβ–ˆβ–Ž | 39/168 [00:06<00:20, 6.28it/s]
24%|β–ˆβ–ˆβ– | 40/168 [00:06<00:20, 6.31it/s]
24%|β–ˆβ–ˆβ– | 41/168 [00:06<00:20, 6.30it/s]
25%|β–ˆβ–ˆβ–Œ | 42/168 [00:06<00:18, 6.89it/s]
26%|β–ˆβ–ˆβ–Œ | 43/168 [00:06<00:20, 5.96it/s]
26%|β–ˆβ–ˆβ–Œ | 44/168 [00:07<00:19, 6.46it/s]
27%|β–ˆβ–ˆβ–‹ | 45/168 [00:07<00:21, 5.72it/s]
27%|β–ˆβ–ˆβ–‹ | 46/168 [00:07<00:23, 5.28it/s]
28%|β–ˆβ–ˆβ–Š | 47/168 [00:07<00:25, 4.73it/s]
29%|β–ˆβ–ˆβ–Š | 48/168 [00:07<00:22, 5.28it/s]
29%|β–ˆβ–ˆβ–‰ | 49/168 [00:08<00:21, 5.64it/s]
30%|β–ˆβ–ˆβ–‰ | 50/168 [00:08<00:19, 6.08it/s]
30%|β–ˆβ–ˆβ–ˆ | 51/168 [00:08<00:21, 5.42it/s]
31%|β–ˆβ–ˆβ–ˆ | 52/168 [00:08<00:20, 5.68it/s]
32%|β–ˆβ–ˆβ–ˆβ– | 53/168 [00:08<00:22, 5.00it/s]
32%|β–ˆβ–ˆβ–ˆβ– | 54/168 [00:09<00:22, 5.10it/s]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 55/168 [00:09<00:21, 5.22it/s]
33%|β–ˆβ–ˆβ–ˆβ–Ž | 56/168 [00:09<00:19, 5.75it/s]
34%|β–ˆβ–ˆβ–ˆβ– | 57/168 [00:09<00:17, 6.50it/s]
35%|β–ˆβ–ˆβ–ˆβ– | 58/168 [00:09<00:18, 5.82it/s]
35%|β–ˆβ–ˆβ–ˆβ–Œ | 59/168 [00:09<00:21, 5.01it/s]
36%|β–ˆβ–ˆβ–ˆβ–Œ | 60/168 [00:10<00:19, 5.43it/s]
36%|β–ˆβ–ˆβ–ˆβ–‹ | 61/168 [00:10<00:17, 6.23it/s]
37%|β–ˆβ–ˆβ–ˆβ–‹ | 62/168 [00:10<00:15, 6.80it/s]
38%|β–ˆβ–ˆβ–ˆβ–Š | 63/168 [00:10<00:15, 6.98it/s]
38%|β–ˆβ–ˆβ–ˆβ–Š | 64/168 [00:10<00:14, 7.13it/s]
39%|β–ˆβ–ˆβ–ˆβ–Š | 65/168 [00:10<00:14, 6.99it/s]
39%|β–ˆβ–ˆβ–ˆβ–‰ | 66/168 [00:10<00:13, 7.40it/s]
40%|β–ˆβ–ˆβ–ˆβ–‰ | 67/168 [00:11<00:17, 5.83it/s]
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 68/168 [00:11<00:19, 5.08it/s]
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 69/168 [00:11<00:17, 5.55it/s]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 70/168 [00:11<00:15, 6.15it/s]
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 71/168 [00:11<00:14, 6.50it/s]
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 72/168 [00:11<00:16, 5.78it/s]
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 73/168 [00:12<00:19, 4.99it/s]
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 74/168 [00:12<00:19, 4.94it/s]
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 75/168 [00:12<00:18, 5.11it/s]
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 77/168 [00:12<00:13, 6.51it/s]
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 78/168 [00:12<00:12, 7.02it/s]
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 79/168 [00:13<00:12, 7.07it/s]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 80/168 [00:13<00:13, 6.42it/s]
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 81/168 [00:13<00:13, 6.36it/s]
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 82/168 [00:13<00:13, 6.30it/s]
49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 83/168 [00:13<00:13, 6.38it/s]
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 84/168 [00:13<00:14, 5.63it/s]
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 85/168 [00:14<00:16, 4.99it/s]
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 86/168 [00:14<00:14, 5.53it/s]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 87/168 [00:14<00:13, 6.01it/s]
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 88/168 [00:14<00:12, 6.64it/s]
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 89/168 [00:14<00:12, 6.46it/s]
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 90/168 [00:14<00:11, 6.51it/s]
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 91/168 [00:15<00:11, 6.53it/s]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 92/168 [00:15<00:10, 7.09it/s]
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 93/168 [00:15<00:10, 7.20it/s]
56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 94/168 [00:15<00:10, 7.29it/s]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 95/168 [00:15<00:09, 7.34it/s]
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 96/168 [00:15<00:12, 5.72it/s]
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 97/168 [00:16<00:13, 5.09it/s]
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 98/168 [00:16<00:14, 4.85it/s]
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 99/168 [00:16<00:12, 5.42it/s]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 100/168 [00:16<00:11, 5.69it/s]
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 101/168 [00:16<00:10, 6.38it/s]
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 102/168 [00:16<00:10, 6.57it/s]
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 103/168 [00:16<00:09, 7.15it/s]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 104/168 [00:17<00:11, 5.73it/s]
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 105/168 [00:17<00:10, 6.08it/s]
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 106/168 [00:17<00:09, 6.21it/s]
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 107/168 [00:17<00:09, 6.46it/s]
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 108/168 [00:17<00:09, 6.60it/s]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 109/168 [00:17<00:09, 6.36it/s]
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 110/168 [00:18<00:08, 6.97it/s]
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 111/168 [00:18<00:08, 6.43it/s]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 112/168 [00:18<00:08, 6.29it/s]
67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 113/168 [00:18<00:09, 6.02it/s]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 114/168 [00:18<00:08, 6.34it/s]
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 115/168 [00:18<00:07, 6.66it/s]
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 116/168 [00:19<00:08, 5.85it/s]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 117/168 [00:19<00:08, 6.21it/s]
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 118/168 [00:19<00:08, 6.16it/s]
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 119/168 [00:19<00:08, 5.47it/s]
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 120/168 [00:19<00:09, 4.83it/s]
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 121/168 [00:20<00:09, 4.99it/s]
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 122/168 [00:20<00:09, 4.75it/s]
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 123/168 [00:20<00:08, 5.47it/s]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 124/168 [00:20<00:08, 5.03it/s]
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 125/168 [00:20<00:07, 5.49it/s]
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 126/168 [00:20<00:06, 6.18it/s]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 127/168 [00:21<00:06, 6.57it/s]
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 128/168 [00:21<00:07, 5.39it/s]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 129/168 [00:21<00:07, 5.44it/s]
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 130/168 [00:21<00:07, 4.82it/s]
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 131/168 [00:21<00:06, 5.59it/s]
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 132/168 [00:22<00:06, 5.74it/s]
79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 133/168 [00:22<00:06, 5.67it/s]
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 134/168 [00:22<00:05, 6.34it/s]
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 135/168 [00:22<00:05, 6.52it/s]
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 136/168 [00:22<00:04, 7.07it/s]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 137/168 [00:22<00:05, 5.69it/s]
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 138/168 [00:23<00:05, 5.85it/s]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 139/168 [00:23<00:04, 6.27it/s]
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 140/168 [00:23<00:04, 6.96it/s]
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 141/168 [00:23<00:03, 7.33it/s]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 142/168 [00:23<00:03, 7.28it/s]
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 143/168 [00:23<00:04, 5.78it/s]
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 144/168 [00:24<00:04, 5.05it/s]
86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 145/168 [00:24<00:04, 5.33it/s]
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 146/168 [00:24<00:03, 5.79it/s]
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 147/168 [00:24<00:03, 6.16it/s]
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 148/168 [00:24<00:02, 6.82it/s]
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 149/168 [00:24<00:03, 5.53it/s]
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 151/168 [00:25<00:02, 5.67it/s]
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 153/168 [00:25<00:02, 5.77it/s]
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 154/168 [00:25<00:02, 6.06it/s]
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 155/168 [00:25<00:02, 5.79it/s]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 156/168 [00:26<00:02, 5.38it/s]
93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 157/168 [00:26<00:02, 5.35it/s]
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 158/168 [00:26<00:01, 6.06it/s]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 159/168 [00:26<00:01, 6.10it/s]
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 160/168 [00:26<00:01, 5.81it/s]
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 161/168 [00:26<00:01, 5.73it/s]
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 162/168 [00:27<00:01, 5.98it/s]
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 163/168 [00:27<00:00, 5.16it/s]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 164/168 [00:27<00:00, 5.61it/s]
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 165/168 [00:27<00:00, 6.31it/s]
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 166/168 [00:27<00:00, 6.79it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 168/168 [00:28<00:00, 6.51it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 168/168 [00:28<00:00, 5.99it/s]
***** eval metrics *****
epoch = 1.0
eval_loss = 0.3524
eval_runtime = 0:00:28.24
eval_samples_per_second = 11.859
eval_steps_per_second = 5.947
[INFO|modelcard.py:452] 2024-01-04 10:03:56,150 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}