mcse-coco-bert-base-uncased / eval_results.log
Miaoran's picture
Upload 2 files
292581e verified
raw
history blame contribute delete
No virus
15.7 kB
2021-10-01 09:01:40,065 : ***** Transfer task : STS12 *****
2021-10-01 09:01:42,261 : MSRpar : pearson = 0.6331, spearman = 0.6265, align_loss = 0.2011, uniform_loss = -2.5322
2021-10-01 09:01:43,203 : MSRvid : pearson = 0.8752, spearman = 0.8747, align_loss = 0.2308, uniform_loss = -2.3171
2021-10-01 09:01:44,077 : SMTeuroparl : pearson = 0.5281, spearman = 0.6156, align_loss = 0.2567, uniform_loss = -1.7214
2021-10-01 09:01:45,299 : surprise.OnWN : pearson = 0.7492, spearman = 0.7087, align_loss = 0.2956, uniform_loss = -2.4748
2021-10-01 09:01:45,905 : surprise.SMTnews : pearson = 0.7292, spearman = 0.6400, align_loss = 0.2303, uniform_loss = -1.8483
2021-10-01 09:01:45,914 : ALL : Pearson = 0.8025, Spearman = 0.7235, align_loss = 0.2428, uniform_loss = -2.2394
2021-10-01 09:01:45,914 : ALL (weighted average) : Pearson = 0.7164, Spearman = 0.7064, align_loss = 0.2430, uniform_loss = -2.2589
2021-10-01 09:01:45,914 : ALL (average) : Pearson = 0.7030, Spearman = 0.6931, align_loss = 0.2429, uniform_loss = -2.1788
2021-10-01 09:01:45,921 : ***** Transfer task : STS13 (-SMT) *****
2021-10-01 09:01:46,481 : FNWN : pearson = 0.6060, spearman = 0.6148, align_loss = 0.3780, uniform_loss = -2.2212
2021-10-01 09:01:47,511 : headlines : pearson = 0.7903, spearman = 0.7947, align_loss = 0.2445, uniform_loss = -2.4584
2021-10-01 09:01:48,105 : OnWN : pearson = 0.8326, spearman = 0.8225, align_loss = 0.3193, uniform_loss = -2.2413
2021-10-01 09:01:48,108 : ALL : Pearson = 0.8012, Spearman = 0.8073, align_loss = 0.2938, uniform_loss = -2.3384
2021-10-01 09:01:48,108 : ALL (weighted average) : Pearson = 0.7829, Spearman = 0.7824, align_loss = 0.2893, uniform_loss = -2.3473
2021-10-01 09:01:48,108 : ALL (average) : Pearson = 0.7430, Spearman = 0.7440, align_loss = 0.3139, uniform_loss = -2.3070
2021-10-01 09:01:48,109 : ***** Transfer task : STS14 *****
2021-10-01 09:01:48,965 : deft-forum : pearson = 0.5599, spearman = 0.5496, align_loss = 0.3066, uniform_loss = -2.4918
2021-10-01 09:01:49,697 : deft-news : pearson = 0.8154, spearman = 0.7922, align_loss = 0.1642, uniform_loss = -2.2330
2021-10-01 09:01:50,853 : headlines : pearson = 0.7922, spearman = 0.7868, align_loss = 0.2423, uniform_loss = -2.4423
2021-10-01 09:01:51,903 : images : pearson = 0.8722, spearman = 0.8332, align_loss = 0.2738, uniform_loss = -2.6206
2021-10-01 09:01:52,938 : OnWN : pearson = 0.8628, spearman = 0.8518, align_loss = 0.3249, uniform_loss = -2.2993
2021-10-01 09:01:54,453 : tweet-news : pearson = 0.7894, spearman = 0.7150, align_loss = 0.3931, uniform_loss = -2.4055
2021-10-01 09:01:54,459 : ALL : Pearson = 0.7934, Spearman = 0.7567, align_loss = 0.2943, uniform_loss = -2.4281
2021-10-01 09:01:54,459 : ALL (weighted average) : Pearson = 0.7957, Spearman = 0.7667, align_loss = 0.2967, uniform_loss = -2.4312
2021-10-01 09:01:54,459 : ALL (average) : Pearson = 0.7820, Spearman = 0.7548, align_loss = 0.2841, uniform_loss = -2.4154
2021-10-01 09:01:54,469 : ***** Transfer task : STS15 *****
2021-10-01 09:01:55,305 : answers-forums : pearson = 0.7622, spearman = 0.7691, align_loss = 0.4751, uniform_loss = -2.5207
2021-10-01 09:01:56,116 : answers-students : pearson = 0.7274, spearman = 0.7348, align_loss = 0.2984, uniform_loss = -1.6962
2021-10-01 09:01:56,833 : belief : pearson = 0.8280, spearman = 0.8501, align_loss = 0.3987, uniform_loss = -2.4359
2021-10-01 09:01:58,174 : headlines : pearson = 0.8215, spearman = 0.8271, align_loss = 0.2414, uniform_loss = -2.4547
2021-10-01 09:01:59,504 : images : pearson = 0.8820, spearman = 0.8893, align_loss = 0.2480, uniform_loss = -2.2802
2021-10-01 09:01:59,510 : ALL : Pearson = 0.8265, Spearman = 0.8337, align_loss = 0.3062, uniform_loss = -2.2274
2021-10-01 09:01:59,510 : ALL (weighted average) : Pearson = 0.8065, Spearman = 0.8152, align_loss = 0.3062, uniform_loss = -2.2274
2021-10-01 09:01:59,510 : ALL (average) : Pearson = 0.8042, Spearman = 0.8141, align_loss = 0.3323, uniform_loss = -2.2776
2021-10-01 09:01:59,515 : ***** Transfer task : STS16 *****
2021-10-01 09:01:59,998 : answer-answer : pearson = 0.6969, spearman = 0.6955, align_loss = 0.3543, uniform_loss = -2.0589
2021-10-01 09:02:00,259 : headlines : pearson = 0.8023, spearman = 0.8239, align_loss = 0.2242, uniform_loss = -2.4768
2021-10-01 09:02:00,577 : plagiarism : pearson = 0.8537, spearman = 0.8668, align_loss = 0.1743, uniform_loss = -2.0453
2021-10-01 09:02:01,089 : postediting : pearson = 0.8560, spearman = 0.8795, align_loss = 0.1283, uniform_loss = -2.4398
2021-10-01 09:02:01,342 : question-question : pearson = 0.6924, spearman = 0.6868, align_loss = 0.2773, uniform_loss = -2.2263
2021-10-01 09:02:01,345 : ALL : Pearson = 0.7716, Spearman = 0.7809, align_loss = 0.2317, uniform_loss = -2.2494
2021-10-01 09:02:01,345 : ALL (weighted average) : Pearson = 0.7814, Spearman = 0.7920, align_loss = 0.2320, uniform_loss = -2.2519
2021-10-01 09:02:01,345 : ALL (average) : Pearson = 0.7803, Spearman = 0.7905, align_loss = 0.2317, uniform_loss = -2.2494
2021-10-01 09:02:01,349 :
***** Transfer task : STSBenchmark*****
2021-10-01 09:02:11,474 : train : pearson = 0.8156, spearman = 0.7934, align_loss = 0.2513, uniform_loss = -2.4757
2021-10-01 09:02:14,290 : dev : pearson = 0.8488, spearman = 0.8472, align_loss = 0.2706, uniform_loss = -2.5047
2021-10-01 09:02:16,780 : test : pearson = 0.7966, spearman = 0.7883, align_loss = 0.2571, uniform_loss = -2.4174
2021-10-01 09:02:16,787 : ALL : Pearson = 0.8198, Spearman = 0.8047, align_loss = 0.2556, uniform_loss = -2.4714
2021-10-01 09:02:16,787 : ALL (weighted average) : Pearson = 0.8183, Spearman = 0.8019, align_loss = 0.2556, uniform_loss = -2.4714
2021-10-01 09:02:16,787 : ALL (average) : Pearson = 0.8203, Spearman = 0.8096, align_loss = 0.2597, uniform_loss = -2.4659
2021-10-01 09:02:16,796 :
***** Transfer task : SICKRelatedness*****
2021-10-01 09:02:23,651 : train : pearson = 0.7910, spearman = 0.7024, align_loss = 0.2231, uniform_loss = -2.3434
2021-10-01 09:02:24,561 : dev : pearson = 0.7941, spearman = 0.7294, align_loss = 0.2196, uniform_loss = -2.5349
2021-10-01 09:02:31,960 : test : pearson = 0.7900, spearman = 0.6979, align_loss = 0.2213, uniform_loss = -2.3409
2021-10-01 09:02:31,967 : ALL : Pearson = 0.7907, Spearman = 0.7016, align_loss = 0.2220, uniform_loss = -2.3518
2021-10-01 09:02:31,967 : ALL (weighted average) : Pearson = 0.7906, Spearman = 0.7015, align_loss = 0.2220, uniform_loss = -2.3518
2021-10-01 09:02:31,967 : ALL (average) : Pearson = 0.7917, Spearman = 0.7099, align_loss = 0.2213, uniform_loss = -2.4064
2021-10-01 09:02:31,968 : ------ test ------
2021-10-01 09:02:31,969 : +--------+--------+--------+--------+--------+--------------+-----------------+--------+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
+--------+--------+--------+--------+--------+--------------+-----------------+--------+
| 72.35 | 80.73 | 75.67 | 83.37 | 78.09 | 78.83 | 69.79 | 76.98 |
| 0.243 | 0.294 | 0.294 | 0.306 | 0.232 | 0.257 | 0.221 | 0.264 |
| -2.239 | -2.338 | -2.428 | -2.227 | -2.249 | -2.417 | -2.341 | -2.320 |
+--------+--------+--------+--------+--------+--------------+-----------------+--------+
2021-10-01 09:02:31,971 : +------+------+------+------+------+------+------+------+
| MR | CR | SUBJ | MPQA | SST2 | TREC | MRPC | Avg. |
+------+------+------+------+------+------+------+------+
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
+------+------+------+------+------+------+------+------+
2021-10-03 08:46:37,510 : ***** Transfer task : STS12 *****
2021-10-03 08:46:40,901 : MSRpar : pearson = 0.5880, spearman = 0.5984, align_loss = 0.2252, uniform_loss = -2.6787
2021-10-03 08:46:42,191 : MSRvid : pearson = 0.8921, spearman = 0.8908, align_loss = 0.2316, uniform_loss = -2.5007
2021-10-03 08:46:43,343 : SMTeuroparl : pearson = 0.5285, spearman = 0.6106, align_loss = 0.2731, uniform_loss = -1.7537
2021-10-03 08:46:45,415 : surprise.OnWN : pearson = 0.7406, spearman = 0.6921, align_loss = 0.3048, uniform_loss = -2.5574
2021-10-03 08:46:46,571 : surprise.SMTnews : pearson = 0.7277, spearman = 0.6356, align_loss = 0.2380, uniform_loss = -1.9060
2021-10-03 08:46:46,574 : ALL : Pearson = 0.8076, Spearman = 0.7234, align_loss = 0.2544, uniform_loss = -2.3484
2021-10-03 08:46:46,574 : ALL (weighted average) : Pearson = 0.7074, Spearman = 0.6982, align_loss = 0.2547, uniform_loss = -2.3707
2021-10-03 08:46:46,574 : ALL (average) : Pearson = 0.6954, Spearman = 0.6855, align_loss = 0.2546, uniform_loss = -2.2793
2021-10-03 08:46:46,578 : ***** Transfer task : STS13 (-SMT) *****
2021-10-03 08:46:47,638 : FNWN : pearson = 0.6040, spearman = 0.6189, align_loss = 0.4101, uniform_loss = -2.3160
2021-10-03 08:46:49,129 : headlines : pearson = 0.7866, spearman = 0.7911, align_loss = 0.2323, uniform_loss = -2.5041
2021-10-03 08:46:50,342 : OnWN : pearson = 0.8134, spearman = 0.8051, align_loss = 0.3154, uniform_loss = -2.2660
2021-10-03 08:46:50,346 : ALL : Pearson = 0.7865, Spearman = 0.7944, align_loss = 0.2916, uniform_loss = -2.3836
2021-10-03 08:46:50,346 : ALL (weighted average) : Pearson = 0.7736, Spearman = 0.7746, align_loss = 0.2858, uniform_loss = -2.3914
2021-10-03 08:46:50,346 : ALL (average) : Pearson = 0.7347, Spearman = 0.7383, align_loss = 0.3193, uniform_loss = -2.3621
2021-10-03 08:46:50,348 : ***** Transfer task : STS14 *****
2021-10-03 08:46:51,678 : deft-forum : pearson = 0.5178, spearman = 0.5013, align_loss = 0.3319, uniform_loss = -2.5933
2021-10-03 08:46:52,875 : deft-news : pearson = 0.8103, spearman = 0.7737, align_loss = 0.1740, uniform_loss = -2.3745
2021-10-03 08:46:54,579 : headlines : pearson = 0.7742, spearman = 0.7526, align_loss = 0.2295, uniform_loss = -2.4763
2021-10-03 08:46:56,190 : images : pearson = 0.8803, spearman = 0.8357, align_loss = 0.2804, uniform_loss = -2.8383
2021-10-03 08:46:57,857 : OnWN : pearson = 0.8478, spearman = 0.8432, align_loss = 0.3234, uniform_loss = -2.3374
2021-10-03 08:46:59,957 : tweet-news : pearson = 0.7761, spearman = 0.6955, align_loss = 0.4371, uniform_loss = -2.5998
2021-10-03 08:46:59,962 : ALL : Pearson = 0.7708, Spearman = 0.7288, align_loss = 0.3055, uniform_loss = -2.5486
2021-10-03 08:46:59,962 : ALL (weighted average) : Pearson = 0.7826, Spearman = 0.7475, align_loss = 0.3078, uniform_loss = -2.5515
2021-10-03 08:46:59,962 : ALL (average) : Pearson = 0.7677, Spearman = 0.7337, align_loss = 0.2960, uniform_loss = -2.5366
2021-10-03 08:46:59,968 : ***** Transfer task : STS15 *****
2021-10-03 08:47:01,635 : answers-forums : pearson = 0.7260, spearman = 0.7319, align_loss = 0.4913, uniform_loss = -2.6655
2021-10-03 08:47:03,236 : answers-students : pearson = 0.7329, spearman = 0.7356, align_loss = 0.3290, uniform_loss = -1.7544
2021-10-03 08:47:04,764 : belief : pearson = 0.8161, spearman = 0.8396, align_loss = 0.4395, uniform_loss = -2.5462
2021-10-03 08:47:06,566 : headlines : pearson = 0.8060, spearman = 0.8116, align_loss = 0.2341, uniform_loss = -2.4907
2021-10-03 08:47:08,375 : images : pearson = 0.9027, spearman = 0.9108, align_loss = 0.2459, uniform_loss = -2.4940
2021-10-03 08:47:08,380 : ALL : Pearson = 0.8219, Spearman = 0.8295, align_loss = 0.3186, uniform_loss = -2.3362
2021-10-03 08:47:08,380 : ALL (weighted average) : Pearson = 0.8031, Spearman = 0.8109, align_loss = 0.3186, uniform_loss = -2.3362
2021-10-03 08:47:08,380 : ALL (average) : Pearson = 0.7967, Spearman = 0.8059, align_loss = 0.3480, uniform_loss = -2.3902
2021-10-03 08:47:08,384 : ***** Transfer task : STS16 *****
2021-10-03 08:47:09,587 : answer-answer : pearson = 0.7146, spearman = 0.7066, align_loss = 0.3366, uniform_loss = -2.1176
2021-10-03 08:47:10,209 : headlines : pearson = 0.7777, spearman = 0.7936, align_loss = 0.2124, uniform_loss = -2.5313
2021-10-03 08:47:10,964 : plagiarism : pearson = 0.8479, spearman = 0.8575, align_loss = 0.1975, uniform_loss = -2.0827
2021-10-03 08:47:12,216 : postediting : pearson = 0.8545, spearman = 0.8739, align_loss = 0.1343, uniform_loss = -2.5950
2021-10-03 08:47:12,780 : question-question : pearson = 0.7229, spearman = 0.7325, align_loss = 0.2821, uniform_loss = -2.4114
2021-10-03 08:47:12,783 : ALL : Pearson = 0.7810, Spearman = 0.7898, align_loss = 0.2326, uniform_loss = -2.3476
2021-10-03 08:47:12,783 : ALL (weighted average) : Pearson = 0.7839, Spearman = 0.7931, align_loss = 0.2323, uniform_loss = -2.3477
2021-10-03 08:47:12,783 : ALL (average) : Pearson = 0.7835, Spearman = 0.7928, align_loss = 0.2326, uniform_loss = -2.3476
2021-10-03 08:47:12,786 :
***** Transfer task : STSBenchmark*****
2021-10-03 08:47:32,536 : train : pearson = 0.8064, spearman = 0.7806, align_loss = 0.2562, uniform_loss = -2.5998
2021-10-03 08:47:38,110 : dev : pearson = 0.8418, spearman = 0.8427, align_loss = 0.2780, uniform_loss = -2.6537
2021-10-03 08:47:42,861 : test : pearson = 0.7969, spearman = 0.7901, align_loss = 0.2528, uniform_loss = -2.5614
2021-10-03 08:47:42,870 : ALL : Pearson = 0.8125, Spearman = 0.7966, align_loss = 0.2595, uniform_loss = -2.6031
2021-10-03 08:47:42,870 : ALL (weighted average) : Pearson = 0.8110, Spearman = 0.7929, align_loss = 0.2594, uniform_loss = -2.6030
2021-10-03 08:47:42,870 : ALL (average) : Pearson = 0.8150, Spearman = 0.8045, align_loss = 0.2624, uniform_loss = -2.6050
2021-10-03 08:47:42,878 :
***** Transfer task : SICKRelatedness*****
2021-10-03 08:47:55,348 : train : pearson = 0.8237, spearman = 0.7467, align_loss = 0.2326, uniform_loss = -2.4715
2021-10-03 08:47:56,876 : dev : pearson = 0.8242, spearman = 0.7745, align_loss = 0.2351, uniform_loss = -2.6674
2021-10-03 08:48:10,119 : test : pearson = 0.8150, spearman = 0.7396, align_loss = 0.2319, uniform_loss = -2.4592
2021-10-03 08:48:10,131 : ALL : Pearson = 0.8195, Spearman = 0.7444, align_loss = 0.2324, uniform_loss = -2.4753
2021-10-03 08:48:10,131 : ALL (weighted average) : Pearson = 0.8194, Spearman = 0.7446, align_loss = 0.2324, uniform_loss = -2.4753
2021-10-03 08:48:10,131 : ALL (average) : Pearson = 0.8210, Spearman = 0.7536, align_loss = 0.2332, uniform_loss = -2.5327
2021-10-03 08:48:10,132 : ------ test ------
2021-10-03 08:48:10,133 : +--------+--------+--------+--------+--------+--------------+-----------------+--------+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
+--------+--------+--------+--------+--------+--------------+-----------------+--------+
| 72.34 | 79.44 | 72.88 | 82.95 | 78.98 | 79.01 | 73.96 | 77.08 |
| 0.254 | 0.292 | 0.306 | 0.319 | 0.233 | 0.253 | 0.232 | 0.270 |
| -2.348 | -2.384 | -2.549 | -2.336 | -2.348 | -2.561 | -2.459 | -2.426 |
+--------+--------+--------+--------+--------+--------------+-----------------+--------+
2021-10-03 08:48:10,135 : +------+------+------+------+------+------+------+------+
| MR | CR | SUBJ | MPQA | SST2 | TREC | MRPC | Avg. |
+------+------+------+------+------+------+------+------+
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
+------+------+------+------+------+------+------+------+