09/23/2023 12:10:45 - WARNING - __main__ - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, 16-bits training: False 09/23/2023 12:11:04 - INFO - __main__ - Training/evaluation parameters Namespace(train_file='../../../data/mcqa/atomic/train_atm_n_2i_half_sample_name.jsonl', dev_file='../../../data/mcqa/atomic/dev_random_10k.jsonl', model_type='deberta-mlm', model_name_or_path='microsoft/deberta-v3-large', config_name='', tokenizer_name='', cache_dir='.cache', task_name='atomic', output_dir='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', second_train_file=None, second_dev_file=None, max_seq_length=128, max_words_to_mask=6, max_sequence_per_time=80, do_train=True, do_eval=True, do_ext_eval=True, evaluate_during_training=True, do_lower_case=False, per_gpu_train_batch_size=2, per_gpu_eval_batch_size=16, gradient_accumulation_steps=16, margin=1.0, learning_rate=5e-06, weight_decay=0.01, adam_epsilon=1e-06, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, warmup_steps=0, warmup_proportion=0.05, logging_steps=50, save_steps=500, logits_file='logits_test.txt', results_file='eval_results.txt', no_cuda=False, overwrite_output_dir=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, server_ip='', server_port='', eval_output_dir='./eval_results', n_gpu=1, device=device(type='cuda')) 09/23/2023 12:11:13 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 12:11:13 - INFO - __main__ - Num examples = 10000 09/23/2023 12:11:13 - INFO - __main__ - Batch size = 16 09/23/2023 12:15:11 - INFO - __main__ - ***** Eval results ***** 09/23/2023 12:15:11 - INFO - __main__ - acc = 0.3392 09/23/2023 12:25:13 - INFO - __main__ - warm up steps = 835 09/23/2023 12:25:13 - INFO - __main__ - ***** Running training ***** 09/23/2023 12:25:13 - INFO - __main__ - Num examples = 534833 09/23/2023 12:25:13 - INFO - __main__ - Num Epochs = 1 09/23/2023 12:25:13 - INFO - __main__ - Instantaneous batch size per GPU = 2 09/23/2023 12:25:13 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 32 09/23/2023 12:25:13 - INFO - __main__ - Gradient Accumulation steps = 16 09/23/2023 12:25:13 - INFO - __main__ - Total optimization steps = 16713 09/23/2023 12:28:54 - INFO - __main__ - global_step = 50, average loss = 0.6903331369534135 09/23/2023 12:32:33 - INFO - __main__ - global_step = 100, average loss = 0.6819266405794769 09/23/2023 12:36:13 - INFO - __main__ - global_step = 150, average loss = 0.6690767159638926 09/23/2023 12:39:56 - INFO - __main__ - global_step = 200, average loss = 0.6476348407182377 09/23/2023 12:43:39 - INFO - __main__ - global_step = 250, average loss = 0.6220815655076877 09/23/2023 12:47:19 - INFO - __main__ - global_step = 300, average loss = 0.5299683179453859 09/23/2023 12:50:56 - INFO - __main__ - global_step = 350, average loss = 0.39345016410181416 09/23/2023 12:54:38 - INFO - __main__ - global_step = 400, average loss = 0.31127411118301096 09/23/2023 12:58:19 - INFO - __main__ - global_step = 450, average loss = 0.25150225180907 09/23/2023 13:02:00 - INFO - __main__ - global_step = 500, average loss = 0.22586858159028453 09/23/2023 13:02:01 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 13:02:01 - INFO - __main__ - Num examples = 10000 09/23/2023 13:02:01 - INFO - __main__ - Batch size = 16 09/23/2023 13:05:56 - INFO - __main__ - ***** Eval results ***** 09/23/2023 13:05:56 - INFO - __main__ - acc = 0.6996 09/23/2023 13:06:23 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 13:10:02 - INFO - __main__ - global_step = 550, average loss = 0.22251796642665794 09/23/2023 13:13:46 - INFO - __main__ - global_step = 600, average loss = 0.19366045010890956 09/23/2023 13:17:29 - INFO - __main__ - global_step = 650, average loss = 0.18587105088678071 09/23/2023 13:21:15 - INFO - __main__ - global_step = 700, average loss = 0.1760789550206391 09/23/2023 13:24:59 - INFO - __main__ - global_step = 750, average loss = 0.18312411408871412 09/23/2023 13:28:42 - INFO - __main__ - global_step = 800, average loss = 0.15576540186157217 09/23/2023 13:32:25 - INFO - __main__ - global_step = 850, average loss = 0.16302873345994157 09/23/2023 13:36:07 - INFO - __main__ - global_step = 900, average loss = 0.15725697406036487 09/23/2023 13:39:46 - INFO - __main__ - global_step = 950, average loss = 0.15640976145299645 09/23/2023 13:43:33 - INFO - __main__ - global_step = 1000, average loss = 0.15606625928507128 09/23/2023 13:43:34 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 13:43:34 - INFO - __main__ - Num examples = 10000 09/23/2023 13:43:34 - INFO - __main__ - Batch size = 16 09/23/2023 13:47:30 - INFO - __main__ - ***** Eval results ***** 09/23/2023 13:47:30 - INFO - __main__ - acc = 0.7961 09/23/2023 13:47:58 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 13:51:41 - INFO - __main__ - global_step = 1050, average loss = 0.14431810150181262 09/23/2023 13:55:20 - INFO - __main__ - global_step = 1100, average loss = 0.15233074207513708 09/23/2023 13:59:01 - INFO - __main__ - global_step = 1150, average loss = 0.1404175848151772 09/23/2023 14:02:44 - INFO - __main__ - global_step = 1200, average loss = 0.12134294869215864 09/23/2023 14:06:20 - INFO - __main__ - global_step = 1250, average loss = 0.1363200130731275 09/23/2023 14:09:59 - INFO - __main__ - global_step = 1300, average loss = 0.13769450530940958 09/23/2023 14:13:43 - INFO - __main__ - global_step = 1350, average loss = 0.12156560226379952 09/23/2023 14:17:18 - INFO - __main__ - global_step = 1400, average loss = 0.12623315585107775 09/23/2023 14:20:59 - INFO - __main__ - global_step = 1450, average loss = 0.14377202547417256 09/23/2023 14:24:33 - INFO - __main__ - global_step = 1500, average loss = 0.1286695548933858 09/23/2023 14:24:34 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 14:24:34 - INFO - __main__ - Num examples = 10000 09/23/2023 14:24:34 - INFO - __main__ - Batch size = 16 09/23/2023 14:28:29 - INFO - __main__ - ***** Eval results ***** 09/23/2023 14:28:29 - INFO - __main__ - acc = 0.8048 09/23/2023 14:28:56 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 14:32:42 - INFO - __main__ - global_step = 1550, average loss = 0.1198868363915244 09/23/2023 14:36:24 - INFO - __main__ - global_step = 1600, average loss = 0.12324378551486007 09/23/2023 14:40:00 - INFO - __main__ - global_step = 1650, average loss = 0.11938468464672042 09/23/2023 14:43:41 - INFO - __main__ - global_step = 1700, average loss = 0.14236379045556533 09/23/2023 14:47:22 - INFO - __main__ - global_step = 1750, average loss = 0.13320694023670512 09/23/2023 14:51:02 - INFO - __main__ - global_step = 1800, average loss = 0.13622453257718006 09/23/2023 14:54:42 - INFO - __main__ - global_step = 1850, average loss = 0.13987649206645072 09/23/2023 14:58:22 - INFO - __main__ - global_step = 1900, average loss = 0.12299754774277971 09/23/2023 15:02:05 - INFO - __main__ - global_step = 1950, average loss = 0.11868109124743569 09/23/2023 15:05:47 - INFO - __main__ - global_step = 2000, average loss = 0.1415042275990345 09/23/2023 15:05:47 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 15:05:47 - INFO - __main__ - Num examples = 10000 09/23/2023 15:05:47 - INFO - __main__ - Batch size = 16 09/23/2023 15:09:43 - INFO - __main__ - ***** Eval results ***** 09/23/2023 15:09:43 - INFO - __main__ - acc = 0.8063 09/23/2023 15:10:10 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 15:13:51 - INFO - __main__ - global_step = 2050, average loss = 0.11399275673671581 09/23/2023 15:17:31 - INFO - __main__ - global_step = 2100, average loss = 0.1065546132405143 09/23/2023 15:21:11 - INFO - __main__ - global_step = 2150, average loss = 0.12809142941467144 09/23/2023 15:24:51 - INFO - __main__ - global_step = 2200, average loss = 0.12454848410692648 09/23/2023 15:28:34 - INFO - __main__ - global_step = 2250, average loss = 0.10986286829065647 09/23/2023 15:32:14 - INFO - __main__ - global_step = 2300, average loss = 0.11237965747121052 09/23/2023 15:35:56 - INFO - __main__ - global_step = 2350, average loss = 0.10897610924319451 09/23/2023 15:39:41 - INFO - __main__ - global_step = 2400, average loss = 0.12056981857070241 09/23/2023 15:43:24 - INFO - __main__ - global_step = 2450, average loss = 0.13911059297635803 09/23/2023 15:47:10 - INFO - __main__ - global_step = 2500, average loss = 0.11335444856034883 09/23/2023 15:47:10 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 15:47:10 - INFO - __main__ - Num examples = 10000 09/23/2023 15:47:10 - INFO - __main__ - Batch size = 16 09/23/2023 15:51:06 - INFO - __main__ - ***** Eval results ***** 09/23/2023 15:51:06 - INFO - __main__ - acc = 0.8234 09/23/2023 15:51:32 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 15:55:10 - INFO - __main__ - global_step = 2550, average loss = 0.12103958850973867 09/23/2023 15:58:57 - INFO - __main__ - global_step = 2600, average loss = 0.11913071399074397 09/23/2023 16:02:38 - INFO - __main__ - global_step = 2650, average loss = 0.11255583499452769 09/23/2023 16:06:28 - INFO - __main__ - global_step = 2700, average loss = 0.1006322616293619 09/23/2023 16:10:12 - INFO - __main__ - global_step = 2750, average loss = 0.0932968783121487 09/23/2023 16:13:51 - INFO - __main__ - global_step = 2800, average loss = 0.11056979637924087 09/23/2023 16:17:38 - INFO - __main__ - global_step = 2850, average loss = 0.12318793082176853 09/23/2023 16:21:21 - INFO - __main__ - global_step = 2900, average loss = 0.10864610994302439 09/23/2023 16:25:03 - INFO - __main__ - global_step = 2950, average loss = 0.11261582636667299 09/23/2023 16:28:40 - INFO - __main__ - global_step = 3000, average loss = 0.12150005620278534 09/23/2023 16:28:40 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 16:28:40 - INFO - __main__ - Num examples = 10000 09/23/2023 16:28:40 - INFO - __main__ - Batch size = 16 09/23/2023 16:32:35 - INFO - __main__ - ***** Eval results ***** 09/23/2023 16:32:35 - INFO - __main__ - acc = 0.8261 09/23/2023 16:33:02 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 16:36:46 - INFO - __main__ - global_step = 3050, average loss = 0.10565035182957218 09/23/2023 16:40:30 - INFO - __main__ - global_step = 3100, average loss = 0.10429829731896462 09/23/2023 16:44:14 - INFO - __main__ - global_step = 3150, average loss = 0.10812272985053824 09/23/2023 16:47:54 - INFO - __main__ - global_step = 3200, average loss = 0.12238092143270478 09/23/2023 16:51:33 - INFO - __main__ - global_step = 3250, average loss = 0.10868940783606376 09/23/2023 16:55:14 - INFO - __main__ - global_step = 3300, average loss = 0.1209917226509424 09/23/2023 16:58:59 - INFO - __main__ - global_step = 3350, average loss = 0.1191260662042896 09/23/2023 17:02:41 - INFO - __main__ - global_step = 3400, average loss = 0.1174743126919202 09/23/2023 17:06:26 - INFO - __main__ - global_step = 3450, average loss = 0.100895225374843 09/23/2023 17:10:02 - INFO - __main__ - global_step = 3500, average loss = 0.0931866138278565 09/23/2023 17:10:03 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 17:10:03 - INFO - __main__ - Num examples = 10000 09/23/2023 17:10:03 - INFO - __main__ - Batch size = 16 09/23/2023 17:13:58 - INFO - __main__ - ***** Eval results ***** 09/23/2023 17:13:58 - INFO - __main__ - acc = 0.8229 09/23/2023 17:17:45 - INFO - __main__ - global_step = 3550, average loss = 0.10633477224648231 09/23/2023 17:21:30 - INFO - __main__ - global_step = 3600, average loss = 0.1021722938354651 09/23/2023 17:25:11 - INFO - __main__ - global_step = 3650, average loss = 0.10295378862727375 09/23/2023 17:28:50 - INFO - __main__ - global_step = 3700, average loss = 0.1024187771679135 09/23/2023 17:32:34 - INFO - __main__ - global_step = 3750, average loss = 0.09922411829451448 09/23/2023 17:36:14 - INFO - __main__ - global_step = 3800, average loss = 0.11105157318372222 09/23/2023 17:39:57 - INFO - __main__ - global_step = 3850, average loss = 0.12378941989987652 09/23/2023 17:43:42 - INFO - __main__ - global_step = 3900, average loss = 0.1034327056143593 09/23/2023 17:47:25 - INFO - __main__ - global_step = 3950, average loss = 0.09697925167827634 09/23/2023 17:51:09 - INFO - __main__ - global_step = 4000, average loss = 0.11230336717126192 09/23/2023 17:51:09 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 17:51:09 - INFO - __main__ - Num examples = 10000 09/23/2023 17:51:09 - INFO - __main__ - Batch size = 16 09/23/2023 17:55:05 - INFO - __main__ - ***** Eval results ***** 09/23/2023 17:55:05 - INFO - __main__ - acc = 0.8371 09/23/2023 17:55:32 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 17:59:12 - INFO - __main__ - global_step = 4050, average loss = 0.10925351051962934 09/23/2023 18:03:00 - INFO - __main__ - global_step = 4100, average loss = 0.09795216493275802 09/23/2023 18:06:43 - INFO - __main__ - global_step = 4150, average loss = 0.09962472554965643 09/23/2023 18:10:25 - INFO - __main__ - global_step = 4200, average loss = 0.10342389734141762 09/23/2023 18:14:05 - INFO - __main__ - global_step = 4250, average loss = 0.09674815248567029 09/23/2023 18:17:48 - INFO - __main__ - global_step = 4300, average loss = 0.10319628210134396 09/23/2023 18:21:33 - INFO - __main__ - global_step = 4350, average loss = 0.09340641272166977 09/23/2023 18:25:14 - INFO - __main__ - global_step = 4400, average loss = 0.10845618240913608 09/23/2023 18:28:59 - INFO - __main__ - global_step = 4450, average loss = 0.11604906246473547 09/23/2023 18:32:43 - INFO - __main__ - global_step = 4500, average loss = 0.09590314964269055 09/23/2023 18:32:43 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 18:32:43 - INFO - __main__ - Num examples = 10000 09/23/2023 18:32:43 - INFO - __main__ - Batch size = 16 09/23/2023 18:36:38 - INFO - __main__ - ***** Eval results ***** 09/23/2023 18:36:38 - INFO - __main__ - acc = 0.8305 09/23/2023 18:40:22 - INFO - __main__ - global_step = 4550, average loss = 0.09955280199857952 09/23/2023 18:44:07 - INFO - __main__ - global_step = 4600, average loss = 0.09018894311768236 09/23/2023 18:47:49 - INFO - __main__ - global_step = 4650, average loss = 0.11624654464081687 09/23/2023 18:51:30 - INFO - __main__ - global_step = 4700, average loss = 0.11213955332923434 09/23/2023 18:55:07 - INFO - __main__ - global_step = 4750, average loss = 0.11335175217776851 09/23/2023 18:58:47 - INFO - __main__ - global_step = 4800, average loss = 0.10374061681199237 09/23/2023 19:02:34 - INFO - __main__ - global_step = 4850, average loss = 0.09650620453016018 09/23/2023 19:06:16 - INFO - __main__ - global_step = 4900, average loss = 0.1034209698169434 09/23/2023 19:09:53 - INFO - __main__ - global_step = 4950, average loss = 0.10046588191311458 09/23/2023 19:13:34 - INFO - __main__ - global_step = 5000, average loss = 0.10752027794980677 09/23/2023 19:13:34 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 19:13:34 - INFO - __main__ - Num examples = 10000 09/23/2023 19:13:34 - INFO - __main__ - Batch size = 16 09/23/2023 19:17:29 - INFO - __main__ - ***** Eval results ***** 09/23/2023 19:17:29 - INFO - __main__ - acc = 0.8355 09/23/2023 19:21:19 - INFO - __main__ - global_step = 5050, average loss = 0.10195030277842307 09/23/2023 19:24:58 - INFO - __main__ - global_step = 5100, average loss = 0.10987481483532065 09/23/2023 19:28:41 - INFO - __main__ - global_step = 5150, average loss = 0.10906005093554995 09/23/2023 19:32:23 - INFO - __main__ - global_step = 5200, average loss = 0.09835696181547973 09/23/2023 19:36:06 - INFO - __main__ - global_step = 5250, average loss = 0.10181126694624254 09/23/2023 19:39:52 - INFO - __main__ - global_step = 5300, average loss = 0.08663028705283068 09/23/2023 19:43:30 - INFO - __main__ - global_step = 5350, average loss = 0.10507196654667496 09/23/2023 19:47:18 - INFO - __main__ - global_step = 5400, average loss = 0.108608085659871 09/23/2023 19:51:03 - INFO - __main__ - global_step = 5450, average loss = 0.099619501844536 09/23/2023 19:54:49 - INFO - __main__ - global_step = 5500, average loss = 0.10225338533447939 09/23/2023 19:54:49 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 19:54:49 - INFO - __main__ - Num examples = 10000 09/23/2023 19:54:49 - INFO - __main__ - Batch size = 16 09/23/2023 19:58:45 - INFO - __main__ - ***** Eval results ***** 09/23/2023 19:58:45 - INFO - __main__ - acc = 0.8279 09/23/2023 20:02:26 - INFO - __main__ - global_step = 5550, average loss = 0.10436682683890468 09/23/2023 20:06:11 - INFO - __main__ - global_step = 5600, average loss = 0.10477761221260153 09/23/2023 20:09:52 - INFO - __main__ - global_step = 5650, average loss = 0.09326410317778937 09/23/2023 20:13:31 - INFO - __main__ - global_step = 5700, average loss = 0.11269167278223904 09/23/2023 20:17:16 - INFO - __main__ - global_step = 5750, average loss = 0.10188864256499074 09/23/2023 20:21:00 - INFO - __main__ - global_step = 5800, average loss = 0.10433580860199981 09/23/2023 20:24:43 - INFO - __main__ - global_step = 5850, average loss = 0.08972063858884212 09/23/2023 20:28:22 - INFO - __main__ - global_step = 5900, average loss = 0.1065664726671821 09/23/2023 20:32:07 - INFO - __main__ - global_step = 5950, average loss = 0.10174332244623656 09/23/2023 20:35:49 - INFO - __main__ - global_step = 6000, average loss = 0.08872646622621687 09/23/2023 20:35:49 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 20:35:49 - INFO - __main__ - Num examples = 10000 09/23/2023 20:35:49 - INFO - __main__ - Batch size = 16 09/23/2023 20:39:45 - INFO - __main__ - ***** Eval results ***** 09/23/2023 20:39:45 - INFO - __main__ - acc = 0.8363 09/23/2023 20:43:29 - INFO - __main__ - global_step = 6050, average loss = 0.10705330887685705 09/23/2023 20:47:16 - INFO - __main__ - global_step = 6100, average loss = 0.09171272950654384 09/23/2023 20:50:59 - INFO - __main__ - global_step = 6150, average loss = 0.0861645900901567 09/23/2023 20:54:46 - INFO - __main__ - global_step = 6200, average loss = 0.08994678908144124 09/23/2023 20:58:32 - INFO - __main__ - global_step = 6250, average loss = 0.08786970607354305 09/23/2023 21:02:13 - INFO - __main__ - global_step = 6300, average loss = 0.09656520821336016 09/23/2023 21:05:56 - INFO - __main__ - global_step = 6350, average loss = 0.09620310332989902 09/23/2023 21:09:42 - INFO - __main__ - global_step = 6400, average loss = 0.09152124080545036 09/23/2023 21:13:22 - INFO - __main__ - global_step = 6450, average loss = 0.09472263304131047 09/23/2023 21:17:06 - INFO - __main__ - global_step = 6500, average loss = 0.10554198697194807 09/23/2023 21:17:06 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 21:17:06 - INFO - __main__ - Num examples = 10000 09/23/2023 21:17:06 - INFO - __main__ - Batch size = 16 09/23/2023 21:21:01 - INFO - __main__ - ***** Eval results ***** 09/23/2023 21:21:01 - INFO - __main__ - acc = 0.841 09/23/2023 21:21:28 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 21:25:14 - INFO - __main__ - global_step = 6550, average loss = 0.09830655160796596 09/23/2023 21:28:55 - INFO - __main__ - global_step = 6600, average loss = 0.09539545015402837 09/23/2023 21:32:40 - INFO - __main__ - global_step = 6650, average loss = 0.09118585625503328 09/23/2023 21:36:18 - INFO - __main__ - global_step = 6700, average loss = 0.09700520555491493 09/23/2023 21:40:03 - INFO - __main__ - global_step = 6750, average loss = 0.105271778342576 09/23/2023 21:43:45 - INFO - __main__ - global_step = 6800, average loss = 0.10975144471223758 09/23/2023 21:47:28 - INFO - __main__ - global_step = 6850, average loss = 0.09920243133579788 09/23/2023 21:51:11 - INFO - __main__ - global_step = 6900, average loss = 0.09791661702009151 09/23/2023 21:54:51 - INFO - __main__ - global_step = 6950, average loss = 0.08630025177910283 09/23/2023 21:58:29 - INFO - __main__ - global_step = 7000, average loss = 0.09660528897402401 09/23/2023 21:58:29 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 21:58:29 - INFO - __main__ - Num examples = 10000 09/23/2023 21:58:29 - INFO - __main__ - Batch size = 16 09/23/2023 22:02:25 - INFO - __main__ - ***** Eval results ***** 09/23/2023 22:02:25 - INFO - __main__ - acc = 0.843 09/23/2023 22:02:51 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 22:06:33 - INFO - __main__ - global_step = 7050, average loss = 0.10305566756385814 09/23/2023 22:10:07 - INFO - __main__ - global_step = 7100, average loss = 0.10687436608219286 09/23/2023 22:13:47 - INFO - __main__ - global_step = 7150, average loss = 0.0946133067667688 09/23/2023 22:17:27 - INFO - __main__ - global_step = 7200, average loss = 0.09795189084834419 09/23/2023 22:21:17 - INFO - __main__ - global_step = 7250, average loss = 0.09060888570308634 09/23/2023 22:24:59 - INFO - __main__ - global_step = 7300, average loss = 0.0877145413684775 09/23/2023 22:28:35 - INFO - __main__ - global_step = 7350, average loss = 0.10495714643941029 09/23/2023 22:32:21 - INFO - __main__ - global_step = 7400, average loss = 0.07401456630654138 09/23/2023 22:36:03 - INFO - __main__ - global_step = 7450, average loss = 0.09523518772701209 09/23/2023 22:39:41 - INFO - __main__ - global_step = 7500, average loss = 0.10137952610446518 09/23/2023 22:39:41 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 22:39:41 - INFO - __main__ - Num examples = 10000 09/23/2023 22:39:41 - INFO - __main__ - Batch size = 16 09/23/2023 22:43:37 - INFO - __main__ - ***** Eval results ***** 09/23/2023 22:43:37 - INFO - __main__ - acc = 0.846 09/23/2023 22:44:03 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 22:47:46 - INFO - __main__ - global_step = 7550, average loss = 0.09563293447645264 09/23/2023 22:51:31 - INFO - __main__ - global_step = 7600, average loss = 0.09618103489105125 09/23/2023 22:55:13 - INFO - __main__ - global_step = 7650, average loss = 0.08849806944810552 09/23/2023 22:58:54 - INFO - __main__ - global_step = 7700, average loss = 0.10007433392238455 09/23/2023 23:02:36 - INFO - __main__ - global_step = 7750, average loss = 0.09035434001329122 09/23/2023 23:06:24 - INFO - __main__ - global_step = 7800, average loss = 0.09338357288788757 09/23/2023 23:10:04 - INFO - __main__ - global_step = 7850, average loss = 0.09912064949181514 09/23/2023 23:13:47 - INFO - __main__ - global_step = 7900, average loss = 0.08827902228244057 09/23/2023 23:17:27 - INFO - __main__ - global_step = 7950, average loss = 0.11218067690118914 09/23/2023 23:21:09 - INFO - __main__ - global_step = 8000, average loss = 0.08588292430682486 09/23/2023 23:21:09 - INFO - __main__ - ***** Running evaluation ***** 09/23/2023 23:21:09 - INFO - __main__ - Num examples = 10000 09/23/2023 23:21:09 - INFO - __main__ - Batch size = 16 09/23/2023 23:25:05 - INFO - __main__ - ***** Eval results ***** 09/23/2023 23:25:05 - INFO - __main__ - acc = 0.8472 09/23/2023 23:25:31 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/23/2023 23:29:08 - INFO - __main__ - global_step = 8050, average loss = 0.09245043838061974 09/23/2023 23:32:54 - INFO - __main__ - global_step = 8100, average loss = 0.08283289226481429 09/23/2023 23:36:34 - INFO - __main__ - global_step = 8150, average loss = 0.08407623038449856 09/23/2023 23:40:17 - INFO - __main__ - global_step = 8200, average loss = 0.09736820162237564 09/23/2023 23:44:06 - INFO - __main__ - global_step = 8250, average loss = 0.08463705457368632 09/23/2023 23:47:50 - INFO - __main__ - global_step = 8300, average loss = 0.10010304888644896 09/23/2023 23:51:35 - INFO - __main__ - global_step = 8350, average loss = 0.09222401980725409 09/23/2023 23:55:17 - INFO - __main__ - global_step = 8400, average loss = 0.08634746881416504 09/23/2023 23:58:59 - INFO - __main__ - global_step = 8450, average loss = 0.08723288500368653 09/24/2023 00:02:37 - INFO - __main__ - global_step = 8500, average loss = 0.10130320921433394 09/24/2023 00:02:37 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 00:02:37 - INFO - __main__ - Num examples = 10000 09/24/2023 00:02:37 - INFO - __main__ - Batch size = 16 09/24/2023 00:06:32 - INFO - __main__ - ***** Eval results ***** 09/24/2023 00:06:32 - INFO - __main__ - acc = 0.8452 09/24/2023 00:10:13 - INFO - __main__ - global_step = 8550, average loss = 0.0889340414837352 09/24/2023 00:13:53 - INFO - __main__ - global_step = 8600, average loss = 0.0960574367789377 09/24/2023 00:17:37 - INFO - __main__ - global_step = 8650, average loss = 0.07860265792332939 09/24/2023 00:21:20 - INFO - __main__ - global_step = 8700, average loss = 0.09233207383847912 09/24/2023 00:25:05 - INFO - __main__ - global_step = 8750, average loss = 0.09803196908305836 09/24/2023 00:28:44 - INFO - __main__ - global_step = 8800, average loss = 0.08913468146740343 09/24/2023 00:32:26 - INFO - __main__ - global_step = 8850, average loss = 0.0880054514182666 09/24/2023 00:36:11 - INFO - __main__ - global_step = 8900, average loss = 0.0839999437017832 09/24/2023 00:39:52 - INFO - __main__ - global_step = 8950, average loss = 0.10094311676693905 09/24/2023 00:43:32 - INFO - __main__ - global_step = 9000, average loss = 0.10011614485312748 09/24/2023 00:43:32 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 00:43:32 - INFO - __main__ - Num examples = 10000 09/24/2023 00:43:32 - INFO - __main__ - Batch size = 16 09/24/2023 00:47:27 - INFO - __main__ - ***** Eval results ***** 09/24/2023 00:47:27 - INFO - __main__ - acc = 0.8463 09/24/2023 00:51:10 - INFO - __main__ - global_step = 9050, average loss = 0.09407024829903093 09/24/2023 00:54:48 - INFO - __main__ - global_step = 9100, average loss = 0.09510339217069032 09/24/2023 00:58:27 - INFO - __main__ - global_step = 9150, average loss = 0.09413513723055075 09/24/2023 01:02:10 - INFO - __main__ - global_step = 9200, average loss = 0.08488880819528276 09/24/2023 01:05:47 - INFO - __main__ - global_step = 9250, average loss = 0.09847264970565447 09/24/2023 01:09:28 - INFO - __main__ - global_step = 9300, average loss = 0.08640140883806452 09/24/2023 01:13:08 - INFO - __main__ - global_step = 9350, average loss = 0.07884123000112594 09/24/2023 01:16:54 - INFO - __main__ - global_step = 9400, average loss = 0.0831154512307694 09/24/2023 01:20:32 - INFO - __main__ - global_step = 9450, average loss = 0.09913980022422038 09/24/2023 01:24:11 - INFO - __main__ - global_step = 9500, average loss = 0.09805536182444484 09/24/2023 01:24:11 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 01:24:11 - INFO - __main__ - Num examples = 10000 09/24/2023 01:24:11 - INFO - __main__ - Batch size = 16 09/24/2023 01:28:07 - INFO - __main__ - ***** Eval results ***** 09/24/2023 01:28:07 - INFO - __main__ - acc = 0.8463 09/24/2023 01:31:55 - INFO - __main__ - global_step = 9550, average loss = 0.0912455873134968 09/24/2023 01:35:38 - INFO - __main__ - global_step = 9600, average loss = 0.10278063782119716 09/24/2023 01:39:12 - INFO - __main__ - global_step = 9650, average loss = 0.08788584528032516 09/24/2023 01:42:53 - INFO - __main__ - global_step = 9700, average loss = 0.08058010207216285 09/24/2023 01:46:34 - INFO - __main__ - global_step = 9750, average loss = 0.08765123128723644 09/24/2023 01:50:14 - INFO - __main__ - global_step = 9800, average loss = 0.09005017607181799 09/24/2023 01:54:03 - INFO - __main__ - global_step = 9850, average loss = 0.07892634223760979 09/24/2023 01:57:44 - INFO - __main__ - global_step = 9900, average loss = 0.07999062808303278 09/24/2023 02:01:26 - INFO - __main__ - global_step = 9950, average loss = 0.09494447313452838 09/24/2023 02:05:06 - INFO - __main__ - global_step = 10000, average loss = 0.0841888710015337 09/24/2023 02:05:06 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 02:05:06 - INFO - __main__ - Num examples = 10000 09/24/2023 02:05:06 - INFO - __main__ - Batch size = 16 09/24/2023 02:09:01 - INFO - __main__ - ***** Eval results ***** 09/24/2023 02:09:01 - INFO - __main__ - acc = 0.8471 09/24/2023 02:12:40 - INFO - __main__ - global_step = 10050, average loss = 0.08929907138342968 09/24/2023 02:16:20 - INFO - __main__ - global_step = 10100, average loss = 0.10172551687661326 09/24/2023 02:20:00 - INFO - __main__ - global_step = 10150, average loss = 0.09577305402533966 09/24/2023 02:23:46 - INFO - __main__ - global_step = 10200, average loss = 0.09480085656211486 09/24/2023 02:27:27 - INFO - __main__ - global_step = 10250, average loss = 0.07956519629078684 09/24/2023 02:31:05 - INFO - __main__ - global_step = 10300, average loss = 0.08291967767250753 09/24/2023 02:34:47 - INFO - __main__ - global_step = 10350, average loss = 0.09592102762369904 09/24/2023 02:38:29 - INFO - __main__ - global_step = 10400, average loss = 0.08570889301292482 09/24/2023 02:42:13 - INFO - __main__ - global_step = 10450, average loss = 0.07362440132081247 09/24/2023 02:45:58 - INFO - __main__ - global_step = 10500, average loss = 0.08574875552483718 09/24/2023 02:45:58 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 02:45:58 - INFO - __main__ - Num examples = 10000 09/24/2023 02:45:58 - INFO - __main__ - Batch size = 16 09/24/2023 02:49:53 - INFO - __main__ - ***** Eval results ***** 09/24/2023 02:49:53 - INFO - __main__ - acc = 0.8524 09/24/2023 02:50:20 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 02:54:03 - INFO - __main__ - global_step = 10550, average loss = 0.08846153970320302 09/24/2023 02:57:43 - INFO - __main__ - global_step = 10600, average loss = 0.08381684645668429 09/24/2023 03:01:26 - INFO - __main__ - global_step = 10650, average loss = 0.09288432469184045 09/24/2023 03:05:08 - INFO - __main__ - global_step = 10700, average loss = 0.08199916316298186 09/24/2023 03:08:56 - INFO - __main__ - global_step = 10750, average loss = 0.09068042659768252 09/24/2023 03:12:37 - INFO - __main__ - global_step = 10800, average loss = 0.08719110449641448 09/24/2023 03:16:20 - INFO - __main__ - global_step = 10850, average loss = 0.09036207084544003 09/24/2023 03:20:04 - INFO - __main__ - global_step = 10900, average loss = 0.095746248819637 09/24/2023 03:23:45 - INFO - __main__ - global_step = 10950, average loss = 0.1019882604497252 09/24/2023 03:27:25 - INFO - __main__ - global_step = 11000, average loss = 0.08660416512644588 09/24/2023 03:27:25 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 03:27:25 - INFO - __main__ - Num examples = 10000 09/24/2023 03:27:25 - INFO - __main__ - Batch size = 16 09/24/2023 03:31:21 - INFO - __main__ - ***** Eval results ***** 09/24/2023 03:31:21 - INFO - __main__ - acc = 0.8521 09/24/2023 03:35:00 - INFO - __main__ - global_step = 11050, average loss = 0.07959849048202158 09/24/2023 03:38:42 - INFO - __main__ - global_step = 11100, average loss = 0.08480279741248524 09/24/2023 03:42:25 - INFO - __main__ - global_step = 11150, average loss = 0.07940411141982623 09/24/2023 03:46:06 - INFO - __main__ - global_step = 11200, average loss = 0.08627346496621613 09/24/2023 03:49:48 - INFO - __main__ - global_step = 11250, average loss = 0.08515130840663915 09/24/2023 03:53:28 - INFO - __main__ - global_step = 11300, average loss = 0.08047833000106039 09/24/2023 03:57:07 - INFO - __main__ - global_step = 11350, average loss = 0.08884227124826338 09/24/2023 04:00:47 - INFO - __main__ - global_step = 11400, average loss = 0.09542614945773494 09/24/2023 04:04:26 - INFO - __main__ - global_step = 11450, average loss = 0.08332637125422479 09/24/2023 04:08:07 - INFO - __main__ - global_step = 11500, average loss = 0.09769482501476887 09/24/2023 04:08:07 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 04:08:07 - INFO - __main__ - Num examples = 10000 09/24/2023 04:08:07 - INFO - __main__ - Batch size = 16 09/24/2023 04:12:02 - INFO - __main__ - ***** Eval results ***** 09/24/2023 04:12:02 - INFO - __main__ - acc = 0.851 09/24/2023 04:15:51 - INFO - __main__ - global_step = 11550, average loss = 0.09137944790694746 09/24/2023 04:19:38 - INFO - __main__ - global_step = 11600, average loss = 0.07454582622590351 09/24/2023 04:23:20 - INFO - __main__ - global_step = 11650, average loss = 0.08284565404814202 09/24/2023 04:26:59 - INFO - __main__ - global_step = 11700, average loss = 0.0969824349215196 09/24/2023 04:30:41 - INFO - __main__ - global_step = 11750, average loss = 0.09389037321489013 09/24/2023 04:34:23 - INFO - __main__ - global_step = 11800, average loss = 0.08608788483528769 09/24/2023 04:38:05 - INFO - __main__ - global_step = 11850, average loss = 0.09322659247220144 09/24/2023 04:41:49 - INFO - __main__ - global_step = 11900, average loss = 0.09286965438863262 09/24/2023 04:45:31 - INFO - __main__ - global_step = 11950, average loss = 0.08214385434631367 09/24/2023 04:49:12 - INFO - __main__ - global_step = 12000, average loss = 0.09392224536069989 09/24/2023 04:49:12 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 04:49:12 - INFO - __main__ - Num examples = 10000 09/24/2023 04:49:12 - INFO - __main__ - Batch size = 16 09/24/2023 04:53:07 - INFO - __main__ - ***** Eval results ***** 09/24/2023 04:53:07 - INFO - __main__ - acc = 0.8514 09/24/2023 04:56:53 - INFO - __main__ - global_step = 12050, average loss = 0.08019034011129406 09/24/2023 05:00:34 - INFO - __main__ - global_step = 12100, average loss = 0.08210711618239656 09/24/2023 05:04:16 - INFO - __main__ - global_step = 12150, average loss = 0.08764273267355747 09/24/2023 05:08:02 - INFO - __main__ - global_step = 12200, average loss = 0.08758470895321807 09/24/2023 05:11:48 - INFO - __main__ - global_step = 12250, average loss = 0.07766548367973883 09/24/2023 05:15:27 - INFO - __main__ - global_step = 12300, average loss = 0.08148344823415755 09/24/2023 05:19:08 - INFO - __main__ - global_step = 12350, average loss = 0.08814196670609817 09/24/2023 05:22:50 - INFO - __main__ - global_step = 12400, average loss = 0.08936668847491092 09/24/2023 05:26:29 - INFO - __main__ - global_step = 12450, average loss = 0.08240065188347216 09/24/2023 05:30:12 - INFO - __main__ - global_step = 12500, average loss = 0.08683115135392655 09/24/2023 05:30:12 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 05:30:12 - INFO - __main__ - Num examples = 10000 09/24/2023 05:30:12 - INFO - __main__ - Batch size = 16 09/24/2023 05:34:07 - INFO - __main__ - ***** Eval results ***** 09/24/2023 05:34:07 - INFO - __main__ - acc = 0.8515 09/24/2023 05:37:53 - INFO - __main__ - global_step = 12550, average loss = 0.08871277472944712 09/24/2023 05:41:34 - INFO - __main__ - global_step = 12600, average loss = 0.08797626828309149 09/24/2023 05:45:11 - INFO - __main__ - global_step = 12650, average loss = 0.10095825259459616 09/24/2023 05:48:58 - INFO - __main__ - global_step = 12700, average loss = 0.07953012495926487 09/24/2023 05:52:41 - INFO - __main__ - global_step = 12750, average loss = 0.08843418272979761 09/24/2023 05:56:19 - INFO - __main__ - global_step = 12800, average loss = 0.07413991435227217 09/24/2023 05:59:59 - INFO - __main__ - global_step = 12850, average loss = 0.07519575585451094 09/24/2023 06:03:48 - INFO - __main__ - global_step = 12900, average loss = 0.08996981896292709 09/24/2023 06:07:28 - INFO - __main__ - global_step = 12950, average loss = 0.08996171029284597 09/24/2023 06:11:11 - INFO - __main__ - global_step = 13000, average loss = 0.08077499923689174 09/24/2023 06:11:11 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 06:11:11 - INFO - __main__ - Num examples = 10000 09/24/2023 06:11:11 - INFO - __main__ - Batch size = 16 09/24/2023 06:15:06 - INFO - __main__ - ***** Eval results ***** 09/24/2023 06:15:06 - INFO - __main__ - acc = 0.8527 09/24/2023 06:15:33 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 06:19:13 - INFO - __main__ - global_step = 13050, average loss = 0.08447560470420284 09/24/2023 06:22:54 - INFO - __main__ - global_step = 13100, average loss = 0.08299598100831646 09/24/2023 06:26:32 - INFO - __main__ - global_step = 13150, average loss = 0.08393764879734135 09/24/2023 06:30:08 - INFO - __main__ - global_step = 13200, average loss = 0.09848508099505125 09/24/2023 06:33:47 - INFO - __main__ - global_step = 13250, average loss = 0.09162080157435412 09/24/2023 06:37:28 - INFO - __main__ - global_step = 13300, average loss = 0.0914362099875143 09/24/2023 06:41:09 - INFO - __main__ - global_step = 13350, average loss = 0.07781068138462616 09/24/2023 06:44:55 - INFO - __main__ - global_step = 13400, average loss = 0.08868030074576382 09/24/2023 06:48:36 - INFO - __main__ - global_step = 13450, average loss = 0.08357623873533157 09/24/2023 06:52:18 - INFO - __main__ - global_step = 13500, average loss = 0.08828085365807055 09/24/2023 06:52:18 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 06:52:18 - INFO - __main__ - Num examples = 10000 09/24/2023 06:52:18 - INFO - __main__ - Batch size = 16 09/24/2023 06:56:14 - INFO - __main__ - ***** Eval results ***** 09/24/2023 06:56:14 - INFO - __main__ - acc = 0.8499 09/24/2023 06:59:57 - INFO - __main__ - global_step = 13550, average loss = 0.08140521681067185 09/24/2023 07:03:37 - INFO - __main__ - global_step = 13600, average loss = 0.08341409597109305 09/24/2023 07:07:17 - INFO - __main__ - global_step = 13650, average loss = 0.08142950747031136 09/24/2023 07:10:56 - INFO - __main__ - global_step = 13700, average loss = 0.09089667504686076 09/24/2023 07:14:45 - INFO - __main__ - global_step = 13750, average loss = 0.07177684095106088 09/24/2023 07:18:24 - INFO - __main__ - global_step = 13800, average loss = 0.08592368463818274 09/24/2023 07:22:01 - INFO - __main__ - global_step = 13850, average loss = 0.08120634569131653 09/24/2023 07:25:48 - INFO - __main__ - global_step = 13900, average loss = 0.08909589071197843 09/24/2023 07:29:30 - INFO - __main__ - global_step = 13950, average loss = 0.08629100337015189 09/24/2023 07:33:10 - INFO - __main__ - global_step = 14000, average loss = 0.07722124511306902 09/24/2023 07:33:10 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 07:33:10 - INFO - __main__ - Num examples = 10000 09/24/2023 07:33:10 - INFO - __main__ - Batch size = 16 09/24/2023 07:37:05 - INFO - __main__ - ***** Eval results ***** 09/24/2023 07:37:05 - INFO - __main__ - acc = 0.8533 09/24/2023 07:37:32 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 07:41:11 - INFO - __main__ - global_step = 14050, average loss = 0.08182521525057382 09/24/2023 07:44:48 - INFO - __main__ - global_step = 14100, average loss = 0.0902410151962249 09/24/2023 07:48:28 - INFO - __main__ - global_step = 14150, average loss = 0.07409664937826164 09/24/2023 07:52:12 - INFO - __main__ - global_step = 14200, average loss = 0.08879891355274594 09/24/2023 07:55:53 - INFO - __main__ - global_step = 14250, average loss = 0.09268313445325475 09/24/2023 07:59:30 - INFO - __main__ - global_step = 14300, average loss = 0.08798344542199629 09/24/2023 08:03:13 - INFO - __main__ - global_step = 14350, average loss = 0.09607475698139752 09/24/2023 08:06:59 - INFO - __main__ - global_step = 14400, average loss = 0.07222031111843535 09/24/2023 08:10:40 - INFO - __main__ - global_step = 14450, average loss = 0.07480319764195884 09/24/2023 08:14:19 - INFO - __main__ - global_step = 14500, average loss = 0.0838716509303049 09/24/2023 08:14:19 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 08:14:19 - INFO - __main__ - Num examples = 10000 09/24/2023 08:14:19 - INFO - __main__ - Batch size = 16 09/24/2023 08:18:16 - INFO - __main__ - ***** Eval results ***** 09/24/2023 08:18:16 - INFO - __main__ - acc = 0.8542 09/24/2023 08:18:42 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 08:22:18 - INFO - __main__ - global_step = 14550, average loss = 0.08034001361316769 09/24/2023 08:25:55 - INFO - __main__ - global_step = 14600, average loss = 0.07689567271547276 09/24/2023 08:29:37 - INFO - __main__ - global_step = 14650, average loss = 0.09093381941405823 09/24/2023 08:33:25 - INFO - __main__ - global_step = 14700, average loss = 0.07569706412876258 09/24/2023 08:37:04 - INFO - __main__ - global_step = 14750, average loss = 0.07479940189456101 09/24/2023 08:40:47 - INFO - __main__ - global_step = 14800, average loss = 0.08522207450543647 09/24/2023 08:44:34 - INFO - __main__ - global_step = 14850, average loss = 0.0889268495763099 09/24/2023 08:48:16 - INFO - __main__ - global_step = 14900, average loss = 0.08616152721479012 09/24/2023 08:51:56 - INFO - __main__ - global_step = 14950, average loss = 0.07867321850848384 09/24/2023 08:55:39 - INFO - __main__ - global_step = 15000, average loss = 0.08426695556714549 09/24/2023 08:55:39 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 08:55:39 - INFO - __main__ - Num examples = 10000 09/24/2023 08:55:39 - INFO - __main__ - Batch size = 16 09/24/2023 08:59:34 - INFO - __main__ - ***** Eval results ***** 09/24/2023 08:59:34 - INFO - __main__ - acc = 0.8542 09/24/2023 09:03:12 - INFO - __main__ - global_step = 15050, average loss = 0.07868185437655484 09/24/2023 09:07:00 - INFO - __main__ - global_step = 15100, average loss = 0.08520105790423259 09/24/2023 09:10:42 - INFO - __main__ - global_step = 15150, average loss = 0.09536004922925713 09/24/2023 09:14:19 - INFO - __main__ - global_step = 15200, average loss = 0.08502999547665241 09/24/2023 09:17:58 - INFO - __main__ - global_step = 15250, average loss = 0.08957034896484402 09/24/2023 09:21:34 - INFO - __main__ - global_step = 15300, average loss = 0.07968287494033575 09/24/2023 09:25:14 - INFO - __main__ - global_step = 15350, average loss = 0.08545487473544199 09/24/2023 09:28:55 - INFO - __main__ - global_step = 15400, average loss = 0.08528959889241378 09/24/2023 09:32:38 - INFO - __main__ - global_step = 15450, average loss = 0.08095955706679887 09/24/2023 09:36:19 - INFO - __main__ - global_step = 15500, average loss = 0.08725373520917856 09/24/2023 09:36:19 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 09:36:19 - INFO - __main__ - Num examples = 10000 09/24/2023 09:36:19 - INFO - __main__ - Batch size = 16 09/24/2023 09:40:15 - INFO - __main__ - ***** Eval results ***** 09/24/2023 09:40:15 - INFO - __main__ - acc = 0.8545 09/24/2023 09:40:42 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 09:44:22 - INFO - __main__ - global_step = 15550, average loss = 0.0843266883040269 09/24/2023 09:48:03 - INFO - __main__ - global_step = 15600, average loss = 0.07855528741223679 09/24/2023 09:51:47 - INFO - __main__ - global_step = 15650, average loss = 0.09478737017554523 09/24/2023 09:55:32 - INFO - __main__ - global_step = 15700, average loss = 0.08910313490487169 09/24/2023 09:59:16 - INFO - __main__ - global_step = 15750, average loss = 0.07736712342710234 09/24/2023 10:02:53 - INFO - __main__ - global_step = 15800, average loss = 0.08501649839432503 09/24/2023 10:06:37 - INFO - __main__ - global_step = 15850, average loss = 0.08495221398276044 09/24/2023 10:10:23 - INFO - __main__ - global_step = 15900, average loss = 0.08510145512744202 09/24/2023 10:14:07 - INFO - __main__ - global_step = 15950, average loss = 0.08335533107921947 09/24/2023 10:17:49 - INFO - __main__ - global_step = 16000, average loss = 0.09103241352764599 09/24/2023 10:17:49 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 10:17:49 - INFO - __main__ - Num examples = 10000 09/24/2023 10:17:49 - INFO - __main__ - Batch size = 16 09/24/2023 10:21:45 - INFO - __main__ - ***** Eval results ***** 09/24/2023 10:21:45 - INFO - __main__ - acc = 0.8549 09/24/2023 10:22:12 - INFO - __main__ - Saving model checkpoint to output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 10:25:53 - INFO - __main__ - global_step = 16050, average loss = 0.0808029190406296 09/24/2023 10:29:33 - INFO - __main__ - global_step = 16100, average loss = 0.0950222506766113 09/24/2023 10:33:15 - INFO - __main__ - global_step = 16150, average loss = 0.08560644885961664 09/24/2023 10:36:53 - INFO - __main__ - global_step = 16200, average loss = 0.07925290400889935 09/24/2023 10:40:34 - INFO - __main__ - global_step = 16250, average loss = 0.08252620983123052 09/24/2023 10:44:15 - INFO - __main__ - global_step = 16300, average loss = 0.08747977073326182 09/24/2023 10:47:55 - INFO - __main__ - global_step = 16350, average loss = 0.08805208059333382 09/24/2023 10:51:41 - INFO - __main__ - global_step = 16400, average loss = 0.07935831163018064 09/24/2023 10:55:23 - INFO - __main__ - global_step = 16450, average loss = 0.0807358610859228 09/24/2023 10:59:03 - INFO - __main__ - global_step = 16500, average loss = 0.0775301494665473 09/24/2023 10:59:03 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 10:59:03 - INFO - __main__ - Num examples = 10000 09/24/2023 10:59:03 - INFO - __main__ - Batch size = 16 09/24/2023 11:02:59 - INFO - __main__ - ***** Eval results ***** 09/24/2023 11:02:59 - INFO - __main__ - acc = 0.8532 09/24/2023 11:06:39 - INFO - __main__ - global_step = 16550, average loss = 0.06899339191091712 09/24/2023 11:10:25 - INFO - __main__ - global_step = 16600, average loss = 0.08612027997849508 09/24/2023 11:14:10 - INFO - __main__ - global_step = 16650, average loss = 0.08232147437905951 09/24/2023 11:17:50 - INFO - __main__ - global_step = 16700, average loss = 0.08530993062430753 09/24/2023 11:18:50 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 11:18:50 - INFO - __main__ - Num examples = 10000 09/24/2023 11:18:50 - INFO - __main__ - Batch size = 16 09/24/2023 11:22:45 - INFO - __main__ - ***** Eval results ***** 09/24/2023 11:22:45 - INFO - __main__ - acc = 0.8533 09/24/2023 11:22:45 - INFO - __main__ - global_step = 16713, average loss = 0.11041826268834619 09/24/2023 11:23:18 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 11:23:18 - INFO - __main__ - Num examples = 10000 09/24/2023 11:23:18 - INFO - __main__ - Batch size = 16 09/24/2023 11:27:13 - INFO - __main__ - ***** Eval results ***** 09/24/2023 11:27:13 - INFO - __main__ - acc = 0.8549 09/24/2023 11:27:16 - INFO - evaluate_DeBERTa - Namespace(dataset_file='../../../data/mcqa/eval/socialiqa_dev.jsonl', lm='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', out_dir='./eval_results/deberta-v3-large_2i_atm_half_sample_name_5e-6', device=0, reader='socialiqa', overwrite_output_dir=False, cache_dir=None) 09/24/2023 11:27:16 - INFO - evaluate_DeBERTa - Initializing output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 11:34:38 - INFO - evaluate_DeBERTa - Namespace(dataset_file='../../../data/mcqa/eval/winogrande_dev.jsonl', lm='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', out_dir='./eval_results/deberta-v3-large_2i_atm_half_sample_name_5e-6', device=0, reader='winogrande', overwrite_output_dir=False, cache_dir=None) 09/24/2023 11:34:38 - INFO - evaluate_DeBERTa - Initializing output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 11:37:05 - INFO - evaluate_DeBERTa - Namespace(dataset_file='../../../data/mcqa/eval/piqa_dev.jsonl', lm='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', out_dir='./eval_results/deberta-v3-large_2i_atm_half_sample_name_5e-6', device=0, reader='piqa', overwrite_output_dir=False, cache_dir=None) 09/24/2023 11:37:05 - INFO - evaluate_DeBERTa - Initializing output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 11:43:59 - INFO - evaluate_DeBERTa - Namespace(dataset_file='../../../data/mcqa/eval/commonsenseqa_dev.jsonl', lm='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', out_dir='./eval_results/deberta-v3-large_2i_atm_half_sample_name_5e-6', device=0, reader='commonsenseqa', overwrite_output_dir=False, cache_dir=None) 09/24/2023 11:43:59 - INFO - evaluate_DeBERTa - Initializing output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 11:49:43 - INFO - evaluate_DeBERTa - Namespace(dataset_file='../../../data/mcqa/eval/anli_dev.jsonl', lm='output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6', out_dir='./eval_results/deberta-v3-large_2i_atm_half_sample_name_5e-6', device=0, reader='anli', overwrite_output_dir=False, cache_dir=None) 09/24/2023 11:49:43 - INFO - evaluate_DeBERTa - Initializing output/Output_ATOMIC-pseudo-wWC/deberta-v3-large_2i_atm_half_sample_name_5e-6 09/24/2023 11:54:31 - INFO - __main__ - ***** Running evaluation ***** 09/24/2023 11:54:31 - INFO - __main__ - Num examples = 120 09/24/2023 11:54:31 - INFO - __main__ - Batch size = 16 09/24/2023 11:54:47 - INFO - __main__ - ***** Eval results ***** 09/24/2023 11:54:47 - INFO - __main__ - acc = 0.525