azbert-base / job-25031358-tail.out
w32zhong's picture
update model
f5b3bba
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
Ep#7/40, shard#4/6, save@518%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 8%, In[8, 512], loss=1.42: 18%|β–ˆβ–Š | 1210/6910 [25:10<1:57:14, 1.23s/batch] Ep#7/40, shard#4/6, save@518%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 8%, In[8, 512], loss=1.42: 18%|β–ˆβ–Š | 1211/6910 [25:10<1:56:38, 1.23s/batch] Ep#7/40, shard#4/6, save@519%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 298], loss=1.23: 18%|β–ˆβ–Š | 1211/6910 [25:12<1:56:38, 1.23s/batch] Ep#7/40, shard#4/6, save@519%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 298], loss=1.23: 18%|β–ˆβ–Š | 1212/6910 [25:12<2:01:43, 1.28s/batch] Ep#7/40, shard#4/6, save@520%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 428], loss=1.11: 18%|β–ˆβ–Š | 1212/6910 [25:13<2:01:43, 1.28s/batch] Ep#7/40, shard#4/6, save@520%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 428], loss=1.11: 18%|β–ˆβ–Š | 1213/6910 [25:13<1:59:04, 1.25s/batch] Ep#7/40, shard#4/6, save@521%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 74%, In[8, 383], loss=1.23: 18%|β–ˆβ–Š | 1213/6910 [25:14<1:59:04, 1.25s/batch] Ep#7/40, shard#4/6, save@521%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 74%, In[8, 383], loss=1.23: 18%|β–ˆβ–Š | 1214/6910 [25:14<1:54:26, 1.21s/batch] Ep#7/40, shard#4/6, save@522%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 92%, In[8, 466], loss=0.97: 18%|β–ˆβ–Š | 1214/6910 [25:16<1:54:26, 1.21s/batch] Ep#7/40, shard#4/6, save@522%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 92%, In[8, 466], loss=0.97: 18%|β–ˆβ–Š | 1215/6910 [25:16<1:58:49, 1.25s/batch] Ep#7/40, shard#4/6, save@523%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 512], loss=1.0: 18%|β–ˆβ–Š | 1215/6910 [25:17<1:58:49, 1.25s/batch] Ep#7/40, shard#4/6, save@523%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 512], loss=1.0: 18%|β–ˆβ–Š | 1216/6910 [25:17<2:02:49, 1.29s/batch] Ep#7/40, shard#4/6, save@524%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 89%, In[8, 402], loss=1.32: 18%|β–ˆβ–Š | 1216/6910 [25:18<2:02:49, 1.29s/batch] Ep#7/40, shard#4/6, save@524%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 89%, In[8, 402], loss=1.32: 18%|β–ˆβ–Š | 1217/6910 [25:18<1:57:45, 1.24s/batch] Ep#7/40, shard#4/6, save@525%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 62%, In[8, 216], loss=1.36: 18%|β–ˆβ–Š | 1217/6910 [25:19<1:57:45, 1.24s/batch] Ep#7/40, shard#4/6, save@525%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 62%, In[8, 216], loss=1.36: 18%|β–ˆβ–Š | 1218/6910 [25:19<1:54:26, 1.21s/batch] Ep#7/40, shard#4/6, save@526%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 57%, In[8, 428], loss=1.61: 18%|β–ˆβ–Š | 1218/6910 [25:20<1:54:26, 1.21s/batch] Ep#7/40, shard#4/6, save@526%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 57%, In[8, 428], loss=1.61: 18%|β–ˆβ–Š | 1219/6910 [25:20<1:53:25, 1.20s/batch] Ep#7/40, shard#4/6, save@527%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 399], loss=1.57: 18%|β–ˆβ–Š | 1219/6910 [25:21<1:53:25, 1.20s/batch] Ep#7/40, shard#4/6, save@527%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 399], loss=1.57: 18%|β–ˆβ–Š | 1220/6910 [25:21<1:46:16, 1.12s/batch] Ep#7/40, shard#4/6, save@528%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 303], loss=1.45: 18%|β–ˆβ–Š | 1220/6910 [25:22<1:46:16, 1.12s/batch] Ep#7/40, shard#4/6, save@528%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 303], loss=1.45: 18%|β–ˆβ–Š | 1221/6910 [25:22<1:48:28, 1.14s/batch] Ep#7/40, shard#4/6, save@529%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 71%, In[8, 394], loss=1.2: 18%|β–ˆβ–Š | 1221/6910 [25:24<1:48:28, 1.14s/batch] Ep#7/40, shard#4/6, save@529%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 71%, In[8, 394], loss=1.2: 18%|β–ˆβ–Š | 1222/6910 [25:24<1:45:47, 1.12s/batch] Ep#7/40, shard#4/6, save@530%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 95%, In[8, 316], loss=1.11: 18%|β–ˆβ–Š | 1222/6910 [25:25<1:45:47, 1.12s/batch] Ep#7/40, shard#4/6, save@530%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 95%, In[8, 316], loss=1.11: 18%|β–ˆβ–Š | 1223/6910 [25:25<1:44:08, 1.10s/batch] Ep#7/40, shard#4/6, save@531%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 385], loss=0.79: 18%|β–ˆβ–Š | 1223/6910 [25:26<1:44:08, 1.10s/batch] Ep#7/40, shard#4/6, save@531%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 385], loss=0.79: 18%|β–ˆβ–Š | 1224/6910 [25:26<1:51:34, 1.18s/batch] Ep#7/40, shard#4/6, save@532%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 292], loss=1.61: 18%|β–ˆβ–Š | 1224/6910 [25:28<1:51:34, 1.18s/batch] Ep#7/40, shard#4/6, save@532%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 292], loss=1.61: 18%|β–ˆβ–Š | 1225/6910 [25:28<2:09:58, 1.37s/batch] Ep#7/40, shard#4/6, save@533%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 56%, In[8, 278], loss=1.31: 18%|β–ˆβ–Š | 1225/6910 [25:29<2:09:58, 1.37s/batch] Ep#7/40, shard#4/6, save@533%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 56%, In[8, 278], loss=1.31: 18%|β–ˆβ–Š | 1226/6910 [25:29<2:16:20, 1.44s/batch] Ep#7/40, shard#4/6, save@534%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 332], loss=1.25: 18%|β–ˆβ–Š | 1226/6910 [25:31<2:16:20, 1.44s/batch] Ep#7/40, shard#4/6, save@534%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 332], loss=1.25: 18%|β–ˆβ–Š | 1227/6910 [25:31<2:11:20, 1.39s/batch] Ep#7/40, shard#4/6, save@535%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 85%, In[8, 237], loss=0.94: 18%|β–ˆβ–Š | 1227/6910 [25:32<2:11:20, 1.39s/batch] Ep#7/40, shard#4/6, save@535%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 85%, In[8, 237], loss=0.94: 18%|β–ˆβ–Š | 1228/6910 [25:32<2:08:33, 1.36s/batch] Ep#7/40, shard#4/6, save@536%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 63%, In[8, 332], loss=1.1: 18%|β–ˆβ–Š | 1228/6910 [25:33<2:08:33, 1.36s/batch] Ep#7/40, shard#4/6, save@536%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 63%, In[8, 332], loss=1.1: 18%|β–ˆβ–Š | 1229/6910 [25:33<2:06:53, 1.34s/batch] Ep#7/40, shard#4/6, save@537%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 313], loss=0.77: 18%|β–ˆβ–Š | 1229/6910 [25:34<2:06:53, 1.34s/batch] Ep#7/40, shard#4/6, save@537%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 313], loss=0.77: 18%|β–ˆβ–Š | 1230/6910 [25:35<2:04:53, 1.32s/batch] Ep#7/40, shard#4/6, save@538%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 89%, In[8, 309], loss=1.66: 18%|β–ˆβ–Š | 1230/6910 [25:35<2:04:53, 1.32s/batch] Ep#7/40, shard#4/6, save@538%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 89%, In[8, 309], loss=1.66: 18%|β–ˆβ–Š | 1231/6910 [25:35<1:54:15, 1.21s/batch] Ep#7/40, shard#4/6, save@539%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 435], loss=1.31: 18%|β–ˆβ–Š | 1231/6910 [25:37<1:54:15, 1.21s/batch] Ep#7/40, shard#4/6, save@539%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 435], loss=1.31: 18%|β–ˆβ–Š | 1232/6910 [25:37<1:58:55, 1.26s/batch] Ep#7/40, shard#4/6, save@540%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 307], loss=1.12: 18%|β–ˆβ–Š | 1232/6910 [25:38<1:58:55, 1.26s/batch] Ep#7/40, shard#4/6, save@540%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 307], loss=1.12: 18%|β–ˆβ–Š | 1233/6910 [25:38<1:56:24, 1.23s/batch] Ep#7/40, shard#4/6, save@541%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 51%, In[8, 332], loss=1.34: 18%|β–ˆβ–Š | 1233/6910 [25:40<1:56:24, 1.23s/batch] Ep#7/40, shard#4/6, save@541%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 51%, In[8, 332], loss=1.34: 18%|β–ˆβ–Š | 1234/6910 [25:40<2:25:28, 1.54s/batch] Ep#7/40, shard#4/6, save@542%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 55%, In[8, 303], loss=0.8: 18%|β–ˆβ–Š | 1234/6910 [25:41<2:25:28, 1.54s/batch] Ep#7/40, shard#4/6, save@542%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 55%, In[8, 303], loss=0.8: 18%|β–ˆβ–Š | 1235/6910 [25:41<2:15:03, 1.43s/batch] Ep#7/40, shard#4/6, save@543%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 89%, In[8, 301], loss=1.01: 18%|β–ˆβ–Š | 1235/6910 [25:43<2:15:03, 1.43s/batch] Ep#7/40, shard#4/6, save@543%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 89%, In[8, 301], loss=1.01: 18%|β–ˆβ–Š | 1236/6910 [25:43<2:28:26, 1.57s/batch] Ep#7/40, shard#4/6, save@544%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 84%, In[8, 340], loss=1.14: 18%|β–ˆβ–Š | 1236/6910 [25:44<2:28:26, 1.57s/batch] Ep#7/40, shard#4/6, save@544%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 84%, In[8, 340], loss=1.14: 18%|β–ˆβ–Š | 1237/6910 [25:44<2:16:57, 1.45s/batch] Ep#7/40, shard#4/6, save@545%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 512], loss=1.12: 18%|β–ˆβ–Š | 1237/6910 [25:46<2:16:57, 1.45s/batch] Ep#7/40, shard#4/6, save@545%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 512], loss=1.12: 18%|β–ˆβ–Š | 1238/6910 [25:46<2:16:09, 1.44s/batch] Ep#7/40, shard#4/6, save@546%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 273], loss=1.79: 18%|β–ˆβ–Š | 1238/6910 [25:47<2:16:09, 1.44s/batch] Ep#7/40, shard#4/6, save@546%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 273], loss=1.79: 18%|β–ˆβ–Š | 1239/6910 [25:47<2:04:02, 1.31s/batch] Ep#7/40, shard#4/6, save@547%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 386], loss=1.41: 18%|β–ˆβ–Š | 1239/6910 [25:48<2:04:02, 1.31s/batch] Ep#7/40, shard#4/6, save@547%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 386], loss=1.41: 18%|β–ˆβ–Š | 1240/6910 [25:48<2:04:34, 1.32s/batch] Ep#7/40, shard#4/6, save@548%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 376], loss=1.3: 18%|β–ˆβ–Š | 1240/6910 [25:49<2:04:34, 1.32s/batch] Ep#7/40, shard#4/6, save@548%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 376], loss=1.3: 18%|β–ˆβ–Š | 1241/6910 [25:49<2:00:58, 1.28s/batch] Ep#7/40, shard#4/6, save@549%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 95%, In[8, 386], loss=0.78: 18%|β–ˆβ–Š | 1241/6910 [25:51<2:00:58, 1.28s/batch] Ep#7/40, shard#4/6, save@549%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 95%, In[8, 386], loss=0.78: 18%|β–ˆβ–Š | 1242/6910 [25:51<1:55:41, 1.22s/batch] Ep#7/40, shard#4/6, save@550%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 356], loss=1.39: 18%|β–ˆβ–Š | 1242/6910 [25:52<1:55:41, 1.22s/batch] Ep#7/40, shard#4/6, save@550%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 356], loss=1.39: 18%|β–ˆβ–Š | 1243/6910 [25:52<2:05:14, 1.33s/batch] Ep#7/40, shard#4/6, save@551%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 313], loss=1.24: 18%|β–ˆβ–Š | 1243/6910 [25:53<2:05:14, 1.33s/batch] Ep#7/40, shard#4/6, save@551%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 313], loss=1.24: 18%|β–ˆβ–Š | 1244/6910 [25:53<2:00:38, 1.28s/batch] Ep#7/40, shard#4/6, save@552%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 308], loss=1.86: 18%|β–ˆβ–Š | 1244/6910 [25:54<2:00:38, 1.28s/batch] Ep#7/40, shard#4/6, save@552%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 308], loss=1.86: 18%|β–ˆβ–Š | 1245/6910 [25:54<1:53:36, 1.20s/batch] Ep#7/40, shard#4/6, save@553%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 80%, In[8, 298], loss=1.03: 18%|β–ˆβ–Š | 1245/6910 [25:56<1:53:36, 1.20s/batch] Ep#7/40, shard#4/6, save@553%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 80%, In[8, 298], loss=1.03: 18%|β–ˆβ–Š | 1246/6910 [25:56<2:03:43, 1.31s/batch] Ep#7/40, shard#4/6, save@554%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 490], loss=1.41: 18%|β–ˆβ–Š | 1246/6910 [25:57<2:03:43, 1.31s/batch] Ep#7/40, shard#4/6, save@554%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 490], loss=1.41: 18%|β–ˆβ–Š | 1247/6910 [25:57<2:01:28, 1.29s/batch] Ep#7/40, shard#4/6, save@555%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 431], loss=1.49: 18%|β–ˆβ–Š | 1247/6910 [25:58<2:01:28, 1.29s/batch] Ep#7/40, shard#4/6, save@555%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 431], loss=1.49: 18%|β–ˆβ–Š | 1248/6910 [25:58<1:54:49, 1.22s/batch] Ep#7/40, shard#4/6, save@556%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 292], loss=0.9: 18%|β–ˆβ–Š | 1248/6910 [25:59<1:54:49, 1.22s/batch] Ep#7/40, shard#4/6, save@556%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 292], loss=0.9: 18%|β–ˆβ–Š | 1249/6910 [25:59<1:53:14, 1.20s/batch] Ep#7/40, shard#4/6, save@557%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 399], loss=1.89: 18%|β–ˆβ–Š | 1249/6910 [26:00<1:53:14, 1.20s/batch] Ep#7/40, shard#4/6, save@557%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 399], loss=1.89: 18%|β–ˆβ–Š | 1250/6910 [26:00<1:52:39, 1.19s/batch] Ep#7/40, shard#4/6, save@558%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 52%, In[8, 263], loss=1.24: 18%|β–ˆβ–Š | 1250/6910 [26:02<1:52:39, 1.19s/batch] Ep#7/40, shard#4/6, save@558%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 52%, In[8, 263], loss=1.24: 18%|β–ˆβ–Š | 1251/6910 [26:02<1:48:13, 1.15s/batch] Ep#7/40, shard#4/6, save@559%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 348], loss=0.9: 18%|β–ˆβ–Š | 1251/6910 [26:03<1:48:13, 1.15s/batch] Ep#7/40, shard#4/6, save@559%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 348], loss=0.9: 18%|β–ˆβ–Š | 1252/6910 [26:03<1:47:28, 1.14s/batch] Ep#7/40, shard#4/6, save@560%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 83%, In[8, 318], loss=1.26: 18%|β–ˆβ–Š | 1252/6910 [26:04<1:47:28, 1.14s/batch] Ep#7/40, shard#4/6, save@560%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 83%, In[8, 318], loss=1.26: 18%|β–ˆβ–Š | 1253/6910 [26:04<1:47:07, 1.14s/batch] Ep#7/40, shard#4/6, save@561%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 492], loss=1.08: 18%|β–ˆβ–Š | 1253/6910 [26:05<1:47:07, 1.14s/batch] Ep#7/40, shard#4/6, save@561%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 492], loss=1.08: 18%|β–ˆβ–Š | 1254/6910 [26:05<1:45:45, 1.12s/batch] Ep#7/40, shard#4/6, save@562%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 98%, In[8, 472], loss=1.3: 18%|β–ˆβ–Š | 1254/6910 [26:06<1:45:45, 1.12s/batch] Ep#7/40, shard#4/6, save@562%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 98%, In[8, 472], loss=1.3: 18%|β–ˆβ–Š | 1255/6910 [26:06<1:50:29, 1.17s/batch] Ep#7/40, shard#4/6, save@563%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 84%, In[8, 351], loss=1.53: 18%|β–ˆβ–Š | 1255/6910 [26:07<1:50:29, 1.17s/batch] Ep#7/40, shard#4/6, save@563%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 84%, In[8, 351], loss=1.53: 18%|β–ˆβ–Š | 1256/6910 [26:07<1:44:46, 1.11s/batch] Ep#7/40, shard#4/6, save@564%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 86%, In[8, 433], loss=0.96: 18%|β–ˆβ–Š | 1256/6910 [26:08<1:44:46, 1.11s/batch] Ep#7/40, shard#4/6, save@564%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 86%, In[8, 433], loss=0.96: 18%|β–ˆβ–Š | 1257/6910 [26:08<1:41:52, 1.08s/batch] Ep#7/40, shard#4/6, save@565%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 447], loss=1.09: 18%|β–ˆβ–Š | 1257/6910 [26:09<1:41:52, 1.08s/batch] Ep#7/40, shard#4/6, save@565%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 447], loss=1.09: 18%|β–ˆβ–Š | 1258/6910 [26:09<1:47:08, 1.14s/batch] Ep#7/40, shard#4/6, save@566%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 79%, In[8, 353], loss=1.02: 18%|β–ˆβ–Š | 1258/6910 [26:11<1:47:08, 1.14s/batch] Ep#7/40, shard#4/6, save@566%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 79%, In[8, 353], loss=1.02: 18%|β–ˆβ–Š | 1259/6910 [26:11<1:54:28, 1.22s/batch] Ep#7/40, shard#4/6, save@567%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 375], loss=1.25: 18%|β–ˆβ–Š | 1259/6910 [26:15<1:54:28, 1.22s/batch] Ep#7/40, shard#4/6, save@567%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 375], loss=1.25: 18%|β–ˆβ–Š | 1260/6910 [26:15<3:11:46, 2.04s/batch] Ep#7/40, shard#4/6, save@568%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 94%, In[8, 258], loss=0.8: 18%|β–ˆβ–Š | 1260/6910 [26:16<3:11:46, 2.04s/batch] Ep#7/40, shard#4/6, save@568%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 94%, In[8, 258], loss=0.8: 18%|β–ˆβ–Š | 1261/6910 [26:16<2:49:53, 1.80s/batch] Ep#7/40, shard#4/6, save@569%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 57%, In[8, 258], loss=0.96: 18%|β–ˆβ–Š | 1261/6910 [26:17<2:49:53, 1.80s/batch] Ep#7/40, shard#4/6, save@569%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 57%, In[8, 258], loss=0.96: 18%|β–ˆβ–Š | 1262/6910 [26:17<2:30:19, 1.60s/batch] Ep#7/40, shard#4/6, save@570%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 85%, In[8, 337], loss=0.71: 18%|β–ˆβ–Š | 1262/6910 [26:18<2:30:19, 1.60s/batch] Ep#7/40, shard#4/6, save@570%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 85%, In[8, 337], loss=0.71: 18%|β–ˆβ–Š | 1263/6910 [26:19<2:24:09, 1.53s/batch] Ep#7/40, shard#4/6, save@571%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 93%, In[8, 380], loss=1.06: 18%|β–ˆβ–Š | 1263/6910 [26:20<2:24:09, 1.53s/batch] Ep#7/40, shard#4/6, save@571%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 93%, In[8, 380], loss=1.06: 18%|β–ˆβ–Š | 1264/6910 [26:20<2:14:16, 1.43s/batch] Ep#7/40, shard#4/6, save@572%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 92%, In[8, 359], loss=1.04: 18%|β–ˆβ–Š | 1264/6910 [26:21<2:14:16, 1.43s/batch] Ep#7/40, shard#4/6, save@572%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 92%, In[8, 359], loss=1.04: 18%|β–ˆβ–Š | 1265/6910 [26:21<2:07:04, 1.35s/batch] Ep#7/40, shard#4/6, save@573%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 280], loss=1.21: 18%|β–ˆβ–Š | 1265/6910 [26:22<2:07:04, 1.35s/batch] Ep#7/40, shard#4/6, save@573%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 280], loss=1.21: 18%|β–ˆβ–Š | 1266/6910 [26:22<1:59:06, 1.27s/batch] Ep#7/40, shard#4/6, save@574%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 382], loss=0.91: 18%|β–ˆβ–Š | 1266/6910 [26:23<1:59:06, 1.27s/batch] Ep#7/40, shard#4/6, save@574%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 382], loss=0.91: 18%|β–ˆβ–Š | 1267/6910 [26:23<1:54:57, 1.22s/batch] Ep#7/40, shard#4/6, save@575%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 318], loss=2.04: 18%|β–ˆβ–Š | 1267/6910 [26:25<1:54:57, 1.22s/batch] Ep#7/40, shard#4/6, save@575%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 318], loss=2.04: 18%|β–ˆβ–Š | 1268/6910 [26:25<2:12:58, 1.41s/batch] Ep#7/40, shard#4/6, save@576%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 365], loss=0.96: 18%|β–ˆβ–Š | 1268/6910 [26:26<2:12:58, 1.41s/batch] Ep#7/40, shard#4/6, save@576%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 365], loss=0.96: 18%|β–ˆβ–Š | 1269/6910 [26:26<2:06:32, 1.35s/batch] Ep#7/40, shard#4/6, save@577%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 57%, In[8, 294], loss=1.03: 18%|β–ˆβ–Š | 1269/6910 [26:27<2:06:32, 1.35s/batch] Ep#7/40, shard#4/6, save@577%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 57%, In[8, 294], loss=1.03: 18%|β–ˆβ–Š | 1270/6910 [26:27<1:58:37, 1.26s/batch] Ep#7/40, shard#4/6, save@578%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 351], loss=1.15: 18%|β–ˆβ–Š | 1270/6910 [26:28<1:58:37, 1.26s/batch] Ep#7/40, shard#4/6, save@578%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 351], loss=1.15: 18%|β–ˆβ–Š | 1271/6910 [26:28<1:55:15, 1.23s/batch] Ep#7/40, shard#4/6, save@579%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 65%, In[8, 288], loss=0.73: 18%|β–ˆβ–Š | 1271/6910 [26:30<1:55:15, 1.23s/batch] Ep#7/40, shard#4/6, save@579%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 65%, In[8, 288], loss=0.73: 18%|β–ˆβ–Š | 1272/6910 [26:30<1:54:39, 1.22s/batch] Ep#7/40, shard#4/6, save@580%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 370], loss=1.39: 18%|β–ˆβ–Š | 1272/6910 [26:30<1:54:39, 1.22s/batch] Ep#7/40, shard#4/6, save@580%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 370], loss=1.39: 18%|β–ˆβ–Š | 1273/6910 [26:30<1:46:23, 1.13s/batch] Ep#7/40, shard#4/6, save@581%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 286], loss=0.94: 18%|β–ˆβ–Š | 1273/6910 [26:31<1:46:23, 1.13s/batch] Ep#7/40, shard#4/6, save@581%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 286], loss=0.94: 18%|β–ˆβ–Š | 1274/6910 [26:31<1:43:58, 1.11s/batch] Ep#7/40, shard#4/6, save@582%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 316], loss=1.0: 18%|β–ˆβ–Š | 1274/6910 [26:33<1:43:58, 1.11s/batch] Ep#7/40, shard#4/6, save@582%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 316], loss=1.0: 18%|β–ˆβ–Š | 1275/6910 [26:33<1:45:09, 1.12s/batch] Ep#7/40, shard#4/6, save@583%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 394], loss=1.08: 18%|β–ˆβ–Š | 1275/6910 [26:34<1:45:09, 1.12s/batch] Ep#7/40, shard#4/6, save@583%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 394], loss=1.08: 18%|β–ˆβ–Š | 1276/6910 [26:34<1:42:58, 1.10s/batch] Ep#7/40, shard#4/6, save@584%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 80%, In[8, 229], loss=1.81: 18%|β–ˆβ–Š | 1276/6910 [26:35<1:42:58, 1.10s/batch] Ep#7/40, shard#4/6, save@584%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 80%, In[8, 229], loss=1.81: 18%|β–ˆβ–Š | 1277/6910 [26:35<1:51:15, 1.19s/batch] Ep#7/40, shard#4/6, save@585%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 311], loss=1.05: 18%|β–ˆβ–Š | 1277/6910 [26:36<1:51:15, 1.19s/batch] Ep#7/40, shard#4/6, save@585%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 311], loss=1.05: 18%|β–ˆβ–Š | 1278/6910 [26:36<1:46:58, 1.14s/batch] Ep#7/40, shard#4/6, save@586%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 52%, In[8, 211], loss=1.7: 18%|β–ˆβ–Š | 1278/6910 [26:39<1:46:58, 1.14s/batch] Ep#7/40, shard#4/6, save@586%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 52%, In[8, 211], loss=1.7: 19%|β–ˆβ–Š | 1279/6910 [26:39<2:28:04, 1.58s/batch] Ep#7/40, shard#4/6, save@587%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 52%, In[8, 426], loss=1.57: 19%|β–ˆβ–Š | 1279/6910 [26:40<2:28:04, 1.58s/batch] Ep#7/40, shard#4/6, save@587%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 52%, In[8, 426], loss=1.57: 19%|β–ˆβ–Š | 1280/6910 [26:40<2:16:29, 1.45s/batch] Ep#7/40, shard#4/6, save@588%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 401], loss=1.5: 19%|β–ˆβ–Š | 1280/6910 [26:41<2:16:29, 1.45s/batch] Ep#7/40, shard#4/6, save@588%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 401], loss=1.5: 19%|β–ˆβ–Š | 1281/6910 [26:41<2:07:18, 1.36s/batch] Ep#7/40, shard#4/6, save@589%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 99%, In[8, 375], loss=1.49: 19%|β–ˆβ–Š | 1281/6910 [26:42<2:07:18, 1.36s/batch] Ep#7/40, shard#4/6, save@589%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 99%, In[8, 375], loss=1.49: 19%|β–ˆβ–Š | 1282/6910 [26:42<2:01:16, 1.29s/batch] Ep#7/40, shard#4/6, save@590%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 60%, In[8, 429], loss=1.27: 19%|β–ˆβ–Š | 1282/6910 [26:43<2:01:16, 1.29s/batch] Ep#7/40, shard#4/6, save@590%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 60%, In[8, 429], loss=1.27: 19%|β–ˆβ–Š | 1283/6910 [26:43<1:55:28, 1.23s/batch] Ep#7/40, shard#4/6, save@591%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 362], loss=1.21: 19%|β–ˆβ–Š | 1283/6910 [26:44<1:55:28, 1.23s/batch] Ep#7/40, shard#4/6, save@591%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 362], loss=1.21: 19%|β–ˆβ–Š | 1284/6910 [26:44<1:52:20, 1.20s/batch] Ep#7/40, shard#4/6, save@592%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 98%, In[8, 367], loss=1.58: 19%|β–ˆβ–Š | 1284/6910 [26:46<1:52:20, 1.20s/batch] Ep#7/40, shard#4/6, save@592%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 98%, In[8, 367], loss=1.58: 19%|β–ˆβ–Š | 1285/6910 [26:46<1:59:20, 1.27s/batch] Ep#7/40, shard#4/6, save@593%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 99%, In[8, 349], loss=0.66: 19%|β–ˆβ–Š | 1285/6910 [26:47<1:59:20, 1.27s/batch] Ep#7/40, shard#4/6, save@593%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 99%, In[8, 349], loss=0.66: 19%|β–ˆβ–Š | 1286/6910 [26:47<1:52:27, 1.20s/batch] Ep#7/40, shard#4/6, save@594%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 94%, In[8, 490], loss=1.05: 19%|β–ˆβ–Š | 1286/6910 [26:48<1:52:27, 1.20s/batch] Ep#7/40, shard#4/6, save@594%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 94%, In[8, 490], loss=1.05: 19%|β–ˆβ–Š | 1287/6910 [26:48<1:46:32, 1.14s/batch] Ep#7/40, shard#4/6, save@595%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 386], loss=1.38: 19%|β–ˆβ–Š | 1287/6910 [26:49<1:46:32, 1.14s/batch] Ep#7/40, shard#4/6, save@595%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 386], loss=1.38: 19%|β–ˆβ–Š | 1288/6910 [26:49<1:53:34, 1.21s/batch] Ep#7/40, shard#4/6, save@596%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 292], loss=1.32: 19%|β–ˆβ–Š | 1288/6910 [26:50<1:53:34, 1.21s/batch] Ep#7/40, shard#4/6, save@596%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 292], loss=1.32: 19%|β–ˆβ–Š | 1289/6910 [26:50<1:46:53, 1.14s/batch] Ep#7/40, shard#4/6, save@597%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 98%, In[8, 459], loss=1.1: 19%|β–ˆβ–Š | 1289/6910 [26:51<1:46:53, 1.14s/batch] Ep#7/40, shard#4/6, save@597%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 98%, In[8, 459], loss=1.1: 19%|β–ˆβ–Š | 1290/6910 [26:51<1:44:26, 1.11s/batch] Ep#7/40, shard#4/6, save@598%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 393], loss=0.96: 19%|β–ˆβ–Š | 1290/6910 [26:52<1:44:26, 1.11s/batch] Ep#7/40, shard#4/6, save@598%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 393], loss=0.96: 19%|β–ˆβ–Š | 1291/6910 [26:52<1:46:16, 1.13s/batch] Ep#7/40, shard#4/6, save@599%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 460], loss=0.96: 19%|β–ˆβ–Š | 1291/6910 [26:55<1:46:16, 1.13s/batch] Ep#7/40, shard#4/6, save@599%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 460], loss=0.96: 19%|β–ˆβ–Š | 1292/6910 [26:55<2:28:07, 1.58s/batch] Ep#7/40, shard#4/6, save@600%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 99%, In[8, 234], loss=1.77: 19%|β–ˆβ–Š | 1292/6910 [26:57<2:28:07, 1.58s/batch] Ep#7/40, shard#4/6, save@600%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 99%, In[8, 234], loss=1.77: 19%|β–ˆβ–Š | 1293/6910 [26:57<2:25:21, 1.55s/batch] Ep#7/40, shard#4/6, save@601%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 266], loss=1.12: 19%|β–ˆβ–Š | 1293/6910 [26:58<2:25:21, 1.55s/batch] Ep#7/40, shard#4/6, save@601%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 266], loss=1.12: 19%|β–ˆβ–Š | 1294/6910 [26:58<2:11:32, 1.41s/batch] Ep#7/40, shard#4/6, save@602%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 92%, In[8, 512], loss=1.01: 19%|β–ˆβ–Š | 1294/6910 [26:59<2:11:32, 1.41s/batch] Ep#7/40, shard#4/6, save@602%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 92%, In[8, 512], loss=1.01: 19%|β–ˆβ–Š | 1295/6910 [26:59<2:07:35, 1.36s/batch] Ep#7/40, shard#4/6, save@603%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 357], loss=1.48: 19%|β–ˆβ–Š | 1295/6910 [27:00<2:07:35, 1.36s/batch] Ep#7/40, shard#4/6, save@603%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 357], loss=1.48: 19%|β–ˆβ–‰ | 1296/6910 [27:00<1:59:41, 1.28s/batch] Ep#7/40, shard#4/6, save@604%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 91%, In[8, 362], loss=1.01: 19%|β–ˆβ–‰ | 1296/6910 [27:01<1:59:41, 1.28s/batch] Ep#7/40, shard#4/6, save@604%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 91%, In[8, 362], loss=1.01: 19%|β–ˆβ–‰ | 1297/6910 [27:01<2:03:44, 1.32s/batch] Ep#7/40, shard#4/6, save@605%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 51%, In[8, 271], loss=1.11: 19%|β–ˆβ–‰ | 1297/6910 [27:02<2:03:44, 1.32s/batch] Ep#7/40, shard#4/6, save@605%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 51%, In[8, 271], loss=1.11: 19%|β–ˆβ–‰ | 1298/6910 [27:02<1:57:38, 1.26s/batch] Ep#7/40, shard#4/6, save@606%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 395], loss=0.77: 19%|β–ˆβ–‰ | 1298/6910 [27:04<1:57:38, 1.26s/batch] Ep#7/40, shard#4/6, save@606%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 395], loss=0.77: 19%|β–ˆβ–‰ | 1299/6910 [27:04<1:55:00, 1.23s/batch] Ep#7/40, shard#4/6, save@607%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 267], loss=1.27: 19%|β–ˆβ–‰ | 1299/6910 [27:05<1:55:00, 1.23s/batch] Ep#7/40, shard#4/6, save@607%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 267], loss=1.27: 19%|β–ˆβ–‰ | 1300/6910 [27:05<1:51:39, 1.19s/batch] Ep#7/40, shard#4/6, save@608%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 396], loss=1.2: 19%|β–ˆβ–‰ | 1300/6910 [27:06<1:51:39, 1.19s/batch] Ep#7/40, shard#4/6, save@608%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 396], loss=1.2: 19%|β–ˆβ–‰ | 1301/6910 [27:06<1:57:11, 1.25s/batch] Ep#7/40, shard#4/6, save@609%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 54%, In[8, 385], loss=1.16: 19%|β–ˆβ–‰ | 1301/6910 [27:07<1:57:11, 1.25s/batch] Ep#7/40, shard#4/6, save@609%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 54%, In[8, 385], loss=1.16: 19%|β–ˆβ–‰ | 1302/6910 [27:07<1:51:06, 1.19s/batch] Ep#7/40, shard#4/6, save@610%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 85%, In[8, 231], loss=1.08: 19%|β–ˆβ–‰ | 1302/6910 [27:11<1:51:06, 1.19s/batch] Ep#7/40, shard#4/6, save@610%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 85%, In[8, 231], loss=1.08: 19%|β–ˆβ–‰ | 1303/6910 [27:11<3:12:31, 2.06s/batch] Ep#7/40, shard#4/6, save@611%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 76%, In[8, 263], loss=1.2: 19%|β–ˆβ–‰ | 1303/6910 [27:12<3:12:31, 2.06s/batch] Ep#7/40, shard#4/6, save@611%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 76%, In[8, 263], loss=1.2: 19%|β–ˆβ–‰ | 1304/6910 [27:12<2:47:24, 1.79s/batch] Ep#7/40, shard#4/6, save@612%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 63%, In[8, 399], loss=0.78: 19%|β–ˆβ–‰ | 1304/6910 [27:13<2:47:24, 1.79s/batch] Ep#7/40, shard#4/6, save@612%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 63%, In[8, 399], loss=0.78: 19%|β–ˆβ–‰ | 1305/6910 [27:13<2:24:38, 1.55s/batch] Ep#7/40, shard#4/6, save@613%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 367], loss=1.39: 19%|β–ˆβ–‰ | 1305/6910 [27:15<2:24:38, 1.55s/batch] Ep#7/40, shard#4/6, save@613%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 367], loss=1.39: 19%|β–ˆβ–‰ | 1306/6910 [27:15<2:16:38, 1.46s/batch] Ep#7/40, shard#4/6, save@614%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 66%, In[8, 289], loss=1.62: 19%|β–ˆβ–‰ | 1306/6910 [27:16<2:16:38, 1.46s/batch] Ep#7/40, shard#4/6, save@614%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 66%, In[8, 289], loss=1.62: 19%|β–ˆβ–‰ | 1307/6910 [27:16<2:10:41, 1.40s/batch] Ep#7/40, shard#4/6, save@615%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 280], loss=0.85: 19%|β–ˆβ–‰ | 1307/6910 [27:17<2:10:41, 1.40s/batch] Ep#7/40, shard#4/6, save@615%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 280], loss=0.85: 19%|β–ˆβ–‰ | 1308/6910 [27:17<2:01:49, 1.30s/batch] Ep#7/40, shard#4/6, save@616%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 336], loss=1.03: 19%|β–ˆβ–‰ | 1308/6910 [27:18<2:01:49, 1.30s/batch] Ep#7/40, shard#4/6, save@616%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 336], loss=1.03: 19%|β–ˆβ–‰ | 1309/6910 [27:18<1:58:01, 1.26s/batch] Ep#7/40, shard#4/6, save@617%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 312], loss=1.19: 19%|β–ˆβ–‰ | 1309/6910 [27:19<1:58:01, 1.26s/batch] Ep#7/40, shard#4/6, save@617%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 312], loss=1.19: 19%|β–ˆβ–‰ | 1310/6910 [27:19<1:54:52, 1.23s/batch]
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$x$', '$float$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$x$', '$float$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$x$', '$float$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$x$', '$float$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$'
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$x$', '$float$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $com]
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
ma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$x$', '$float$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$x$', '$float$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$x$', '$float$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
Ep#7/40, shard#4/6, save@618%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 8%, In[8, 422], loss=0.99: 19%|β–ˆβ–‰ | 1310/6910 [27:21<1:54:52, 1.23s/batch] Ep#7/40, shard#4/6, save@618%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 8%, In[8, 422], loss=0.99: 19%|β–ˆβ–‰ | 1311/6910 [27:21<1:53:34, 1.22s/batch] Ep#7/40, shard#4/6, save@619%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 278], loss=1.26: 19%|β–ˆβ–‰ | 1311/6910 [27:22<1:53:34, 1.22s/batch] Ep#7/40, shard#4/6, save@619%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 278], loss=1.26: 19%|β–ˆβ–‰ | 1312/6910 [27:22<1:47:38, 1.15s/batch] Ep#7/40, shard#4/6, save@620%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 51%, In[8, 394], loss=0.94: 19%|β–ˆβ–‰ | 1312/6910 [27:24<1:47:38, 1.15s/batch] Ep#7/40, shard#4/6, save@620%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 51%, In[8, 394], loss=0.94: 19%|β–ˆβ–‰ | 1313/6910 [27:24<2:36:49, 1.68s/batch] Ep#7/40, shard#4/6, save@621%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 390], loss=1.2: 19%|β–ˆβ–‰ | 1313/6910 [27:26<2:36:49, 1.68s/batch] Ep#7/40, shard#4/6, save@621%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 390], loss=1.2: 19%|β–ˆβ–‰ | 1314/6910 [27:26<2:22:13, 1.52s/batch] Ep#7/40, shard#4/6, save@622%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 78%, In[8, 332], loss=0.74: 19%|β–ˆβ–‰ | 1314/6910 [27:28<2:22:13, 1.52s/batch] Ep#7/40, shard#4/6, save@622%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 78%, In[8, 332], loss=0.74: 19%|β–ˆβ–‰ | 1315/6910 [27:28<2:46:36, 1.79s/batch] Ep#7/40, shard#4/6, save@623%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 62%, In[8, 283], loss=1.11: 19%|β–ˆβ–‰ | 1315/6910 [27:29<2:46:36, 1.79s/batch] Ep#7/40, shard#4/6, save@623%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 62%, In[8, 283], loss=1.11: 19%|β–ˆβ–‰ | 1316/6910 [27:29<2:38:06, 1.70s/batch] Ep#7/40, shard#4/6, save@624%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 83%, In[8, 339], loss=1.11: 19%|β–ˆβ–‰ | 1316/6910 [27:31<2:38:06, 1.70s/batch] Ep#7/40, shard#4/6, save@624%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 83%, In[8, 339], loss=1.11: 19%|β–ˆβ–‰ | 1317/6910 [27:31<2:29:01, 1.60s/batch] Ep#7/40, shard#4/6, save@625%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 265], loss=1.29: 19%|β–ˆβ–‰ | 1317/6910 [27:32<2:29:01, 1.60s/batch] Ep#7/40, shard#4/6, save@625%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 265], loss=1.29: 19%|β–ˆβ–‰ | 1318/6910 [27:32<2:09:59, 1.39s/batch] Ep#7/40, shard#4/6, save@626%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 289], loss=0.89: 19%|β–ˆβ–‰ | 1318/6910 [27:33<2:09:59, 1.39s/batch] Ep#7/40, shard#4/6, save@626%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 289], loss=0.89: 19%|β–ˆβ–‰ | 1319/6910 [27:33<2:08:12, 1.38s/batch] Ep#7/40, shard#4/6, save@627%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 59%, In[8, 439], loss=1.33: 19%|β–ˆβ–‰ | 1319/6910 [27:34<2:08:12, 1.38s/batch] Ep#7/40, shard#4/6, save@627%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 59%, In[8, 439], loss=1.33: 19%|β–ˆβ–‰ | 1320/6910 [27:34<2:04:01, 1.33s/batch] Ep#7/40, shard#4/6, save@628%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 73%, In[8, 338], loss=1.05: 19%|β–ˆβ–‰ | 1320/6910 [27:36<2:04:01, 1.33s/batch] Ep#7/40, shard#4/6, save@628%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 73%, In[8, 338], loss=1.05: 19%|β–ˆβ–‰ | 1321/6910 [27:36<2:14:04, 1.44s/batch] Ep#7/40, shard#4/6, save@629%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 442], loss=1.04: 19%|β–ˆβ–‰ | 1321/6910 [27:37<2:14:04, 1.44s/batch] Ep#7/40, shard#4/6, save@629%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 442], loss=1.04: 19%|β–ˆβ–‰ | 1322/6910 [27:37<2:04:37, 1.34s/batch] Ep#7/40, shard#4/6, save@630%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 352], loss=1.31: 19%|β–ˆβ–‰ | 1322/6910 [27:38<2:04:37, 1.34s/batch] Ep#7/40, shard#4/6, save@630%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 352], loss=1.31: 19%|β–ˆβ–‰ | 1323/6910 [27:38<1:53:36, 1.22s/batch] Ep#7/40, shard#4/6, save@631%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 68%, In[8, 421], loss=0.93: 19%|β–ˆβ–‰ | 1323/6910 [27:39<1:53:36, 1.22s/batch] Ep#7/40, shard#4/6, save@631%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 68%, In[8, 421], loss=0.93: 19%|β–ˆβ–‰ | 1324/6910 [27:39<1:50:53, 1.19s/batch] Ep#7/40, shard#4/6, save@632%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 69%, In[8, 374], loss=0.9: 19%|β–ˆβ–‰ | 1324/6910 [27:41<1:50:53, 1.19s/batch] Ep#7/40, shard#4/6, save@632%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 69%, In[8, 374], loss=0.9: 19%|β–ˆβ–‰ | 1325/6910 [27:41<2:13:42, 1.44s/batch] Ep#7/40, shard#4/6, save@633%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 70%, In[8, 376], loss=0.83: 19%|β–ˆβ–‰ | 1325/6910 [27:42<2:13:42, 1.44s/batch] Ep#7/40, shard#4/6, save@633%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 70%, In[8, 376], loss=0.83: 19%|β–ˆβ–‰ | 1326/6910 [27:42<2:01:56, 1.31s/batch] Ep#7/40, shard#4/6, save@634%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 403], loss=1.3: 19%|β–ˆβ–‰ | 1326/6910 [27:43<2:01:56, 1.31s/batch] Ep#7/40, shard#4/6, save@634%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 403], loss=1.3: 19%|β–ˆβ–‰ | 1327/6910 [27:43<1:55:23, 1.24s/batch] Ep#7/40, shard#4/6, save@635%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 91%, In[8, 394], loss=0.94: 19%|β–ˆβ–‰ | 1327/6910 [27:44<1:55:23, 1.24s/batch] Ep#7/40, shard#4/6, save@635%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 91%, In[8, 394], loss=0.94: 19%|β–ˆβ–‰ | 1328/6910 [27:44<1:52:14, 1.21s/batch] Ep#7/40, shard#4/6, save@636%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 251], loss=0.71: 19%|β–ˆβ–‰ | 1328/6910 [27:45<1:52:14, 1.21s/batch] Ep#7/40, shard#4/6, save@636%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 251], loss=0.71: 19%|β–ˆβ–‰ | 1329/6910 [27:45<1:48:07, 1.16s/batch] Ep#7/40, shard#4/6, save@637%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 422], loss=0.99: 19%|β–ˆβ–‰ | 1329/6910 [27:47<1:48:07, 1.16s/batch] Ep#7/40, shard#4/6, save@637%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 422], loss=0.99: 19%|β–ˆβ–‰ | 1330/6910 [27:47<2:03:02, 1.32s/batch] Ep#7/40, shard#4/6, save@638%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 92%, In[8, 377], loss=0.79: 19%|β–ˆβ–‰ | 1330/6910 [27:48<2:03:02, 1.32s/batch] Ep#7/40, shard#4/6, save@638%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 92%, In[8, 377], loss=0.79: 19%|β–ˆβ–‰ | 1331/6910 [27:48<1:52:08, 1.21s/batch] Ep#7/40, shard#4/6, save@639%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 81%, In[8, 377], loss=1.64: 19%|β–ˆβ–‰ | 1331/6910 [27:50<1:52:08, 1.21s/batch] Ep#7/40, shard#4/6, save@639%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 81%, In[8, 377], loss=1.64: 19%|β–ˆβ–‰ | 1332/6910 [27:50<2:23:24, 1.54s/batch] Ep#7/40, shard#4/6, save@640%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 291], loss=0.94: 19%|β–ˆβ–‰ | 1332/6910 [27:52<2:23:24, 1.54s/batch] Ep#7/40, shard#4/6, save@640%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 291], loss=0.94: 19%|β–ˆβ–‰ | 1333/6910 [27:52<2:14:55, 1.45s/batch] Ep#7/40, shard#4/6, save@641%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 399], loss=1.08: 19%|β–ˆβ–‰ | 1333/6910 [27:53<2:14:55, 1.45s/batch] Ep#7/40, shard#4/6, save@641%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 399], loss=1.08: 19%|β–ˆβ–‰ | 1334/6910 [27:53<2:07:43, 1.37s/batch] Ep#7/40, shard#4/6, save@642%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 70%, In[8, 338], loss=1.03: 19%|β–ˆβ–‰ | 1334/6910 [27:54<2:07:43, 1.37s/batch] Ep#7/40, shard#4/6, save@642%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 70%, In[8, 338], loss=1.03: 19%|β–ˆβ–‰ | 1335/6910 [27:54<2:03:40, 1.33s/batch] Ep#7/40, shard#4/6, save@643%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 315], loss=1.23: 19%|β–ˆβ–‰ | 1335/6910 [27:55<2:03:40, 1.33s/batch] Ep#7/40, shard#4/6, save@643%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 315], loss=1.23: 19%|β–ˆβ–‰ | 1336/6910 [27:55<1:58:01, 1.27s/batch] Ep#7/40, shard#4/6, save@644%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 65%, In[8, 358], loss=0.91: 19%|β–ˆβ–‰ | 1336/6910 [27:56<1:58:01, 1.27s/batch] Ep#7/40, shard#4/6, save@644%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 65%, In[8, 358], loss=0.91: 19%|β–ˆβ–‰ | 1337/6910 [27:56<1:51:37, 1.20s/batch] Ep#7/40, shard#4/6, save@645%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 77%, In[8, 252], loss=1.36: 19%|β–ˆβ–‰ | 1337/6910 [27:57<1:51:37, 1.20s/batch] Ep#7/40, shard#4/6, save@645%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 77%, In[8, 252], loss=1.36: 19%|β–ˆβ–‰ | 1338/6910 [27:57<1:51:17, 1.20s/batch] Ep#7/40, shard#4/6, save@646%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 92%, In[8, 504], loss=1.34: 19%|β–ˆβ–‰ | 1338/6910 [27:58<1:51:17, 1.20s/batch] Ep#7/40, shard#4/6, save@646%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 92%, In[8, 504], loss=1.34: 19%|β–ˆβ–‰ | 1339/6910 [27:58<1:46:34, 1.15s/batch] Ep#7/40, shard#4/6, save@647%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 246], loss=1.23: 19%|β–ˆβ–‰ | 1339/6910 [28:00<1:46:34, 1.15s/batch] Ep#7/40, shard#4/6, save@647%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 246], loss=1.23: 19%|β–ˆβ–‰ | 1340/6910 [28:00<1:42:57, 1.11s/batch] Ep#7/40, shard#4/6, save@648%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 79%, In[8, 432], loss=1.41: 19%|β–ˆβ–‰ | 1340/6910 [28:01<1:42:57, 1.11s/batch] Ep#7/40, shard#4/6, save@648%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 79%, In[8, 432], loss=1.41: 19%|β–ˆβ–‰ | 1341/6910 [28:01<1:43:36, 1.12s/batch] Ep#7/40, shard#4/6, save@649%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 317], loss=0.9: 19%|β–ˆβ–‰ | 1341/6910 [28:02<1:43:36, 1.12s/batch] Ep#7/40, shard#4/6, save@649%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 317], loss=0.9: 19%|β–ˆβ–‰ | 1342/6910 [28:02<1:46:40, 1.15s/batch] Ep#7/40, shard#4/6, save@650%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 81%, In[8, 265], loss=0.92: 19%|β–ˆβ–‰ | 1342/6910 [28:03<1:46:40, 1.15s/batch] Ep#7/40, shard#4/6, save@650%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 81%, In[8, 265], loss=0.92: 19%|β–ˆβ–‰ | 1343/6910 [28:03<1:41:12, 1.09s/batch] Ep#7/40, shard#4/6, save@651%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 56%, In[8, 454], loss=1.29: 19%|β–ˆβ–‰ | 1343/6910 [28:04<1:41:12, 1.09s/batch] Ep#7/40, shard#4/6, save@651%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 56%, In[8, 454], loss=1.29: 19%|β–ˆβ–‰ | 1344/6910 [28:04<1:46:58, 1.15s/batch] Ep#7/40, shard#4/6, save@652%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 88%, In[8, 411], loss=1.42: 19%|β–ˆβ–‰ | 1344/6910 [28:05<1:46:58, 1.15s/batch] Ep#7/40, shard#4/6, save@652%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 88%, In[8, 411], loss=1.42: 19%|β–ˆβ–‰ | 1345/6910 [28:05<1:44:09, 1.12s/batch] Ep#7/40, shard#4/6, save@653%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 75%, In[8, 300], loss=1.34: 19%|β–ˆβ–‰ | 1345/6910 [28:06<1:44:09, 1.12s/batch] Ep#7/40, shard#4/6, save@653%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 75%, In[8, 300], loss=1.34: 19%|β–ˆβ–‰ | 1346/6910 [28:06<1:49:28, 1.18s/batch] Ep#7/40, shard#4/6, save@654%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 501], loss=1.15: 19%|β–ˆβ–‰ | 1346/6910 [28:08<1:49:28, 1.18s/batch] Ep#7/40, shard#4/6, save@654%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 501], loss=1.15: 19%|β–ˆβ–‰ | 1347/6910 [28:08<1:47:39, 1.16s/batch] Ep#7/40, shard#4/6, save@655%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 83%, In[8, 286], loss=1.26: 19%|β–ˆβ–‰ | 1347/6910 [28:10<1:47:39, 1.16s/batch] Ep#7/40, shard#4/6, save@655%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 83%, In[8, 286], loss=1.26: 20%|β–ˆβ–‰ | 1348/6910 [28:10<2:13:18, 1.44s/batch] Ep#7/40, shard#4/6, save@656%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 342], loss=1.48: 20%|β–ˆβ–‰ | 1348/6910 [28:11<2:13:18, 1.44s/batch] Ep#7/40, shard#4/6, save@656%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 342], loss=1.48: 20%|β–ˆβ–‰ | 1349/6910 [28:11<2:02:53, 1.33s/batch] Ep#7/40, shard#4/6, save@657%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 512], loss=0.95: 20%|β–ˆβ–‰ | 1349/6910 [28:12<2:02:53, 1.33s/batch] Ep#7/40, shard#4/6, save@657%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 512], loss=0.95: 20%|β–ˆβ–‰ | 1350/6910 [28:12<1:57:30, 1.27s/batch] Ep#7/40, shard#4/6, save@658%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 274], loss=0.93: 20%|β–ˆβ–‰ | 1350/6910 [28:13<1:57:30, 1.27s/batch] Ep#7/40, shard#4/6, save@658%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 274], loss=0.93: 20%|β–ˆβ–‰ | 1351/6910 [28:13<1:50:42, 1.19s/batch] Ep#7/40, shard#4/6, save@659%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 77%, In[8, 361], loss=1.08: 20%|β–ˆβ–‰ | 1351/6910 [28:14<1:50:42, 1.19s/batch] Ep#7/40, shard#4/6, save@659%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 77%, In[8, 361], loss=1.08: 20%|β–ˆβ–‰ | 1352/6910 [28:14<1:53:33, 1.23s/batch] Ep#7/40, shard#4/6, save@660%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 398], loss=1.44: 20%|β–ˆβ–‰ | 1352/6910 [28:15<1:53:33, 1.23s/batch] Ep#7/40, shard#4/6, save@660%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 398], loss=1.44: 20%|β–ˆβ–‰ | 1353/6910 [28:15<1:45:02, 1.13s/batch] Ep#7/40, shard#4/6, save@661%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 250], loss=1.41: 20%|β–ˆβ–‰ | 1353/6910 [28:16<1:45:02, 1.13s/batch] Ep#7/40, shard#4/6, save@661%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 250], loss=1.41: 20%|β–ˆβ–‰ | 1354/6910 [28:16<1:42:40, 1.11s/batch] Ep#7/40, shard#4/6, save@662%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 210], loss=1.24: 20%|β–ˆβ–‰ | 1354/6910 [28:17<1:42:40, 1.11s/batch] Ep#7/40, shard#4/6, save@662%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 210], loss=1.24: 20%|β–ˆβ–‰ | 1355/6910 [28:17<1:39:51, 1.08s/batch] Ep#7/40, shard#4/6, save@663%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 62%, In[8, 229], loss=0.91: 20%|β–ˆβ–‰ | 1355/6910 [28:18<1:39:51, 1.08s/batch] Ep#7/40, shard#4/6, save@663%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 62%, In[8, 229], loss=0.91: 20%|β–ˆβ–‰ | 1356/6910 [28:18<1:38:51, 1.07s/batch] Ep#7/40, shard#4/6, save@664%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 56%, In[8, 267], loss=1.44: 20%|β–ˆβ–‰ | 1356/6910 [28:19<1:38:51, 1.07s/batch] Ep#7/40, shard#4/6, save@664%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 56%, In[8, 267], loss=1.44: 20%|β–ˆβ–‰ | 1357/6910 [28:19<1:41:26, 1.10s/batch] Ep#7/40, shard#4/6, save@665%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 360], loss=1.39: 20%|β–ˆβ–‰ | 1357/6910 [28:20<1:41:26, 1.10s/batch] Ep#7/40, shard#4/6, save@665%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 360], loss=1.39: 20%|β–ˆβ–‰ | 1358/6910 [28:20<1:41:27, 1.10s/batch] Ep#7/40, shard#4/6, save@666%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 446], loss=0.78: 20%|β–ˆβ–‰ | 1358/6910 [28:22<1:41:27, 1.10s/batch] Ep#7/40, shard#4/6, save@666%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 446], loss=0.78: 20%|β–ˆβ–‰ | 1359/6910 [28:22<1:49:16, 1.18s/batch] Ep#7/40, shard#4/6, save@667%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 412], loss=1.05: 20%|β–ˆβ–‰ | 1359/6910 [28:23<1:49:16, 1.18s/batch] Ep#7/40, shard#4/6, save@667%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 412], loss=1.05: 20%|β–ˆβ–‰ | 1360/6910 [28:23<1:48:59, 1.18s/batch] Ep#7/40, shard#4/6, save@668%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 94%, In[8, 426], loss=1.32: 20%|β–ˆβ–‰ | 1360/6910 [28:24<1:48:59, 1.18s/batch] Ep#7/40, shard#4/6, save@668%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 94%, In[8, 426], loss=1.32: 20%|β–ˆβ–‰ | 1361/6910 [28:24<1:47:08, 1.16s/batch] Ep#7/40, shard#4/6, save@669%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 303], loss=1.61: 20%|β–ˆβ–‰ | 1361/6910 [28:25<1:47:08, 1.16s/batch] Ep#7/40, shard#4/6, save@669%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 303], loss=1.61: 20%|β–ˆβ–‰ | 1362/6910 [28:25<1:49:06, 1.18s/batch] Ep#7/40, shard#4/6, save@670%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 84%, In[8, 512], loss=1.33: 20%|β–ˆβ–‰ | 1362/6910 [28:27<1:49:06, 1.18s/batch] Ep#7/40, shard#4/6, save@670%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 84%, In[8, 512], loss=1.33: 20%|β–ˆβ–‰ | 1363/6910 [28:27<1:51:12, 1.20s/batch] Ep#7/40, shard#4/6, save@671%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 67%, In[8, 476], loss=1.43: 20%|β–ˆβ–‰ | 1363/6910 [28:28<1:51:12, 1.20s/batch] Ep#7/40, shard#4/6, save@671%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 67%, In[8, 476], loss=1.43: 20%|β–ˆβ–‰ | 1364/6910 [28:28<1:49:33, 1.19s/batch] Ep#7/40, shard#4/6, save@672%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 82%, In[8, 290], loss=1.34: 20%|β–ˆβ–‰ | 1364/6910 [28:29<1:49:33, 1.19s/batch] Ep#7/40, shard#4/6, save@672%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 82%, In[8, 290], loss=1.34: 20%|β–ˆβ–‰ | 1365/6910 [28:29<1:45:06, 1.14s/batch] Ep#7/40, shard#4/6, save@673%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 452], loss=0.99: 20%|β–ˆβ–‰ | 1365/6910 [28:30<1:45:06, 1.14s/batch] Ep#7/40, shard#4/6, save@673%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 452], loss=0.99: 20%|β–ˆβ–‰ | 1366/6910 [28:30<1:56:48, 1.26s/batch] Ep#7/40, shard#4/6, save@674%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 292], loss=1.11: 20%|β–ˆβ–‰ | 1366/6910 [28:32<1:56:48, 1.26s/batch] Ep#7/40, shard#4/6, save@674%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 292], loss=1.11: 20%|β–ˆβ–‰ | 1367/6910 [28:32<1:55:00, 1.24s/batch] Ep#7/40, shard#4/6, save@675%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 90%, In[8, 304], loss=1.03: 20%|β–ˆβ–‰ | 1367/6910 [28:32<1:55:00, 1.24s/batch] Ep#7/40, shard#4/6, save@675%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 90%, In[8, 304], loss=1.03: 20%|β–ˆβ–‰ | 1368/6910 [28:33<1:46:23, 1.15s/batch] Ep#7/40, shard#4/6, save@676%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 266], loss=0.98: 20%|β–ˆβ–‰ | 1368/6910 [28:34<1:46:23, 1.15s/batch] Ep#7/40, shard#4/6, save@676%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 266], loss=0.98: 20%|β–ˆβ–‰ | 1369/6910 [28:34<1:49:30, 1.19s/batch] Ep#7/40, shard#4/6, save@677%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 84%, In[8, 363], loss=0.85: 20%|β–ˆβ–‰ | 1369/6910 [28:35<1:49:30, 1.19s/batch] Ep#7/40, shard#4/6, save@677%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 84%, In[8, 363], loss=0.85: 20%|β–ˆβ–‰ | 1370/6910 [28:35<1:55:41, 1.25s/batch] Ep#7/40, shard#4/6, save@678%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 408], loss=1.19: 20%|β–ˆβ–‰ | 1370/6910 [28:36<1:55:41, 1.25s/batch] Ep#7/40, shard#4/6, save@678%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 408], loss=1.19: 20%|β–ˆβ–‰ | 1371/6910 [28:36<1:54:35, 1.24s/batch] Ep#7/40, shard#4/6, save@679%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 78%, In[8, 356], loss=1.09: 20%|β–ˆβ–‰ | 1371/6910 [28:38<1:54:35, 1.24s/batch] Ep#7/40, shard#4/6, save@679%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 78%, In[8, 356], loss=1.09: 20%|β–ˆβ–‰ | 1372/6910 [28:38<2:00:26, 1.30s/batch] Ep#7/40, shard#4/6, save@680%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 305], loss=1.64: 20%|β–ˆβ–‰ | 1372/6910 [28:39<2:00:26, 1.30s/batch] Ep#7/40, shard#4/6, save@680%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 305], loss=1.64: 20%|β–ˆβ–‰ | 1373/6910 [28:39<1:54:30, 1.24s/batch] Ep#7/40, shard#4/6, save@681%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 360], loss=0.97: 20%|β–ˆβ–‰ | 1373/6910 [28:42<1:54:30, 1.24s/batch] Ep#7/40, shard#4/6, save@681%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 360], loss=0.97: 20%|β–ˆβ–‰ | 1374/6910 [28:42<2:37:09, 1.70s/batch] Ep#7/40, shard#4/6, save@682%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 368], loss=1.09: 20%|β–ˆβ–‰ | 1374/6910 [28:43<2:37:09, 1.70s/batch] Ep#7/40, shard#4/6, save@682%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 368], loss=1.09: 20%|β–ˆβ–‰ | 1375/6910 [28:43<2:24:02, 1.56s/batch] Ep#7/40, shard#4/6, save@683%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 340], loss=1.6: 20%|β–ˆβ–‰ | 1375/6910 [28:44<2:24:02, 1.56s/batch] Ep#7/40, shard#4/6, save@683%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 340], loss=1.6: 20%|β–ˆβ–‰ | 1376/6910 [28:44<2:07:35, 1.38s/batch] Ep#7/40, shard#4/6, save@684%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 68%, In[8, 337], loss=1.21: 20%|β–ˆβ–‰ | 1376/6910 [28:45<2:07:35, 1.38s/batch] Ep#7/40, shard#4/6, save@684%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 68%, In[8, 337], loss=1.21: 20%|β–ˆβ–‰ | 1377/6910 [28:45<2:04:36, 1.35s/batch] Ep#7/40, shard#4/6, save@685%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 52%, In[8, 250], loss=1.41: 20%|β–ˆβ–‰ | 1377/6910 [28:47<2:04:36, 1.35s/batch] Ep#7/40, shard#4/6, save@685%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 52%, In[8, 250], loss=1.41: 20%|β–ˆβ–‰ | 1378/6910 [28:47<2:17:22, 1.49s/batch] Ep#7/40, shard#4/6, save@686%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 433], loss=1.57: 20%|β–ˆβ–‰ | 1378/6910 [28:49<2:17:22, 1.49s/batch] Ep#7/40, shard#4/6, save@686%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 433], loss=1.57: 20%|β–ˆβ–‰ | 1379/6910 [28:49<2:26:15, 1.59s/batch] Ep#7/40, shard#4/6, save@687%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 79%, In[8, 306], loss=1.08: 20%|β–ˆβ–‰ | 1379/6910 [28:51<2:26:15, 1.59s/batch] Ep#7/40, shard#4/6, save@687%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 79%, In[8, 306], loss=1.08: 20%|β–ˆβ–‰ | 1380/6910 [28:51<2:35:07, 1.68s/batch] Ep#7/40, shard#4/6, save@688%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 53%, In[8, 502], loss=1.37: 20%|β–ˆβ–‰ | 1380/6910 [28:52<2:35:07, 1.68s/batch] Ep#7/40, shard#4/6, save@688%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 53%, In[8, 502], loss=1.37: 20%|β–ˆβ–‰ | 1381/6910 [28:52<2:19:52, 1.52s/batch] Ep#7/40, shard#4/6, save@689%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 93%, In[8, 370], loss=1.13: 20%|β–ˆβ–‰ | 1381/6910 [28:53<2:19:52, 1.52s/batch] Ep#7/40, shard#4/6, save@689%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 93%, In[8, 370], loss=1.13: 20%|β–ˆβ–ˆ | 1382/6910 [28:53<2:07:29, 1.38s/batch] Ep#7/40, shard#4/6, save@690%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 512], loss=1.24: 20%|β–ˆβ–ˆ | 1382/6910 [28:55<2:07:29, 1.38s/batch]Saving model "6-3-1382" ...
Ep#7/40, shard#4/6, save@690%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 512], loss=1.24: 20%|β–ˆβ–ˆ | 1383/6910 [28:56<2:55:39, 1.91s/batch] Ep#7/40, shard#4/6, save@691%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 205], loss=1.35: 20%|β–ˆβ–ˆ | 1383/6910 [28:57<2:55:39, 1.91s/batch] Ep#7/40, shard#4/6, save@691%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 205], loss=1.35: 20%|β–ˆβ–ˆ | 1384/6910 [28:57<2:29:09, 1.62s/batch] Ep#7/40, shard#4/6, save@0%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 95%, In[8, 409], loss=1.35: 20%|β–ˆβ–ˆ | 1384/6910 [28:58<2:29:09, 1.62s/batch] Ep#7/40, shard#4/6, save@0%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 95%, In[8, 409], loss=1.35: 20%|β–ˆβ–ˆ | 1385/6910 [28:58<2:15:20, 1.47s/batch] Ep#7/40, shard#4/6, save@1%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 429], loss=1.43: 20%|β–ˆβ–ˆ | 1385/6910 [28:59<2:15:20, 1.47s/batch] Ep#7/40, shard#4/6, save@1%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 429], loss=1.43: 20%|β–ˆβ–ˆ | 1386/6910 [28:59<2:02:34, 1.33s/batch] Ep#7/40, shard#4/6, save@2%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 70%, In[8, 262], loss=1.18: 20%|β–ˆβ–ˆ | 1386/6910 [29:00<2:02:34, 1.33s/batch] Ep#7/40, shard#4/6, save@2%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 70%, In[8, 262], loss=1.18: 20%|β–ˆβ–ˆ | 1387/6910 [29:00<1:56:40, 1.27s/batch] Ep#7/40, shard#4/6, save@3%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 98%, In[8, 377], loss=1.4: 20%|β–ˆβ–ˆ | 1387/6910 [29:01<1:56:40, 1.27s/batch] Ep#7/40, shard#4/6, save@3%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 98%, In[8, 377], loss=1.4: 20%|β–ˆβ–ˆ | 1388/6910 [29:01<1:55:24, 1.25s/batch] Ep#7/40, shard#4/6, save@4%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 458], loss=1.02: 20%|β–ˆβ–ˆ | 1388/6910 [29:03<1:55:24, 1.25s/batch] Ep#7/40, shard#4/6, save@4%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 458], loss=1.02: 20%|β–ˆβ–ˆ | 1389/6910 [29:03<1:50:19, 1.20s/batch] Ep#7/40, shard#4/6, save@5%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 381], loss=1.36: 20%|β–ˆβ–ˆ | 1389/6910 [29:04<1:50:19, 1.20s/batch] Ep#7/40, shard#4/6, save@5%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 381], loss=1.36: 20%|β–ˆβ–ˆ | 1390/6910 [29:04<1:47:40, 1.17s/batch] Ep#7/40, shard#4/6, save@6%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 75%, In[8, 431], loss=1.52: 20%|β–ˆβ–ˆ | 1390/6910 [29:05<1:47:40, 1.17s/batch] Ep#7/40, shard#4/6, save@6%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 75%, In[8, 431], loss=1.52: 20%|β–ˆβ–ˆ | 1391/6910 [29:05<1:55:38, 1.26s/batch] Ep#7/40, shard#4/6, save@7%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 423], loss=1.2: 20%|β–ˆβ–ˆ | 1391/6910 [29:06<1:55:38, 1.26s/batch] Ep#7/40, shard#4/6, save@7%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 423], loss=1.2: 20%|β–ˆβ–ˆ | 1392/6910 [29:06<1:47:25, 1.17s/batch] Ep#7/40, shard#4/6, save@8%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 99%, In[8, 367], loss=1.09: 20%|β–ˆβ–ˆ | 1392/6910 [29:07<1:47:25, 1.17s/batch] Ep#7/40, shard#4/6, save@8%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 99%, In[8, 367], loss=1.09: 20%|β–ˆβ–ˆ | 1393/6910 [29:07<1:50:10, 1.20s/batch] Ep#7/40, shard#4/6, save@9%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 281], loss=0.96: 20%|β–ˆβ–ˆ | 1393/6910 [29:09<1:50:10, 1.20s/batch] Ep#7/40, shard#4/6, save@9%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 281], loss=0.96: 20%|β–ˆβ–ˆ | 1394/6910 [29:09<1:50:30, 1.20s/batch] Ep#7/40, shard#4/6, save@10%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 365], loss=1.37: 20%|β–ˆβ–ˆ | 1394/6910 [29:10<1:50:30, 1.20s/batch] Ep#7/40, shard#4/6, save@10%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 365], loss=1.37: 20%|β–ˆβ–ˆ | 1395/6910 [29:10<1:58:34, 1.29s/batch] Ep#7/40, shard#4/6, save@11%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 502], loss=1.29: 20%|β–ˆβ–ˆ | 1395/6910 [29:11<1:58:34, 1.29s/batch] Ep#7/40, shard#4/6, save@11%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 502], loss=1.29: 20%|β–ˆβ–ˆ | 1396/6910 [29:11<1:51:43, 1.22s/batch] Ep#7/40, shard#4/6, save@12%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 257], loss=1.84: 20%|β–ˆβ–ˆ | 1396/6910 [29:12<1:51:43, 1.22s/batch] Ep#7/40, shard#4/6, save@12%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 257], loss=1.84: 20%|β–ˆβ–ˆ | 1397/6910 [29:12<1:46:08, 1.16s/batch] Ep#7/40, shard#4/6, save@13%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 428], loss=1.53: 20%|β–ˆβ–ˆ | 1397/6910 [29:13<1:46:08, 1.16s/batch] Ep#7/40, shard#4/6, save@13%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 428], loss=1.53: 20%|β–ˆβ–ˆ | 1398/6910 [29:13<1:40:38, 1.10s/batch] Ep#7/40, shard#4/6, save@14%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 59%, In[8, 273], loss=1.26: 20%|β–ˆβ–ˆ | 1398/6910 [29:14<1:40:38, 1.10s/batch] Ep#7/40, shard#4/6, save@14%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 59%, In[8, 273], loss=1.26: 20%|β–ˆβ–ˆ | 1399/6910 [29:14<1:48:58, 1.19s/batch] Ep#7/40, shard#4/6, save@15%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 77%, In[8, 512], loss=0.66: 20%|β–ˆβ–ˆ | 1399/6910 [29:16<1:48:58, 1.19s/batch] Ep#7/40, shard#4/6, save@15%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 77%, In[8, 512], loss=0.66: 20%|β–ˆβ–ˆ | 1400/6910 [29:16<1:49:21, 1.19s/batch] Ep#7/40, shard#4/6, save@16%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 65%, In[8, 375], loss=0.97: 20%|β–ˆβ–ˆ | 1400/6910 [29:21<1:49:21, 1.19s/batch] Ep#7/40, shard#4/6, save@16%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 65%, In[8, 375], loss=0.97: 20%|β–ˆβ–ˆ | 1401/6910 [29:21<3:39:05, 2.39s/batch] Ep#7/40, shard#4/6, save@17%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 402], loss=1.02: 20%|β–ˆβ–ˆ | 1401/6910 [29:22<3:39:05, 2.39s/batch] Ep#7/40, shard#4/6, save@17%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 402], loss=1.02: 20%|β–ˆβ–ˆ | 1402/6910 [29:22<3:03:15, 2.00s/batch] Ep#7/40, shard#4/6, save@18%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 238], loss=1.07: 20%|β–ˆβ–ˆ | 1402/6910 [29:23<3:03:15, 2.00s/batch] Ep#7/40, shard#4/6, save@18%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 238], loss=1.07: 20%|β–ˆβ–ˆ | 1403/6910 [29:23<2:44:52, 1.80s/batch] Ep#7/40, shard#4/6, save@19%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 99%, In[8, 394], loss=0.67: 20%|β–ˆβ–ˆ | 1403/6910 [29:24<2:44:52, 1.80s/batch] Ep#7/40, shard#4/6, save@19%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 99%, In[8, 394], loss=0.67: 20%|β–ˆβ–ˆ | 1404/6910 [29:24<2:28:00, 1.61s/batch] Ep#7/40, shard#4/6, save@20%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 379], loss=1.46: 20%|β–ˆβ–ˆ | 1404/6910 [29:26<2:28:00, 1.61s/batch] Ep#7/40, shard#4/6, save@20%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 379], loss=1.46: 20%|β–ˆβ–ˆ | 1405/6910 [29:26<2:17:21, 1.50s/batch] Ep#7/40, shard#4/6, save@21%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 304], loss=0.99: 20%|β–ˆβ–ˆ | 1405/6910 [29:27<2:17:21, 1.50s/batch] Ep#7/40, shard#4/6, save@21%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 97%, In[8, 304], loss=0.99: 20%|β–ˆβ–ˆ | 1406/6910 [29:27<2:20:47, 1.53s/batch] Ep#7/40, shard#4/6, save@22%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 55%, In[8, 301], loss=1.27: 20%|β–ˆβ–ˆ | 1406/6910 [29:28<2:20:47, 1.53s/batch] Ep#7/40, shard#4/6, save@22%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 55%, In[8, 301], loss=1.27: 20%|β–ˆβ–ˆ | 1407/6910 [29:28<2:08:44, 1.40s/batch] Ep#7/40, shard#4/6, save@23%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 85%, In[8, 512], loss=1.09: 20%|β–ˆβ–ˆ | 1407/6910 [29:30<2:08:44, 1.40s/batch] Ep#7/40, shard#4/6, save@23%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 85%, In[8, 512], loss=1.09: 20%|β–ˆβ–ˆ | 1408/6910 [29:30<2:03:42, 1.35s/batch] Ep#7/40, shard#4/6, save@24%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 484], loss=1.79: 20%|β–ˆβ–ˆ | 1408/6910 [29:31<2:03:42, 1.35s/batch] Ep#7/40, shard#4/6, save@24%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 484], loss=1.79: 20%|β–ˆβ–ˆ | 1409/6910 [29:31<1:52:36, 1.23s/batch] Ep#7/40, shard#4/6, save@25%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 442], loss=0.75: 20%|β–ˆβ–ˆ | 1409/6910 [29:32<1:52:36, 1.23s/batch] Ep#7/40, shard#4/6, save@25%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 442], loss=0.75: 20%|β–ˆβ–ˆ | 1410/6910 [29:32<1:49:38, 1.20s/batch]
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$float$', '$x$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$float$', '$x$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$float$', '$x$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$float$', '$x$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$float$', '$x$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$float$', '$x$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$float$', '$x$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
She needs to [MASK] that [MASK] gets only ten minutes.
MASK[0] top candidates: ['know', 'show', 'prove']
MASK[1] top candidates: ['she', 'he', 'it']
Determine the [MASK] of $f$ $($ $x$ [MASK] $equal$ $x$ $plus$ $root$ ${$ $4$ $minus$ $x$ $supscript$ $2$ $}$ without [MASK]
MASK[0] top candidates: ['value', 'derivative', 'integral']
MASK[1] top candidates: ['$)$', '$]$', '$plus$']
MASK[2] top candidates: ['substitution', '.', 'loss']
$sin$ $x$ $equal$ [MASK] $minus$ $frac$ ${$ $x$ $supscript$ $3$ $}$ ${$ $3$ $fact$ $}$ $plus$ $frac$ ${$ $x$ $supscript$ $5$ $}$ ${$ [MASK] $fact$ $}$ $minus$ $frac$ ${$ $x$ $supscript$ $7$ $}$ ${$ $7$ $fact$ $}$ $dots$
MASK[0] top candidates: ['$1$', '$float$', '$x$']
MASK[1] top candidates: ['$7$', '$5$', '$6$']
Maybe, going to natural [MASK] could help. I mean: $log$ $somenum$ [MASK] $frac$ $)$?
MASK[0] top candidates: ['numbers', 'number', '##s']
MASK[1] top candidates: ['$($', '$minus$', '$plus$']
$F$ $($ $y$ $minus$ $x$ $)$$u$ $($ $x$ [MASK] $y$ $)$ $equal$ $minus$ $ln$ $vert$ $F$ $($ $y$ $minus$ $x$ $)$ $minus$ $x$ $supscript$ $2$ $vert$ where $F$ is any differentiable [MASK].
MASK[0] top candidates: ['$minus$', '$plus$', '$comma$']
MASK[1] top candidates: ['function', 'functional', 'functions']
With Euler's formula, it simplifies to $int$ $subscript$ $0$ [MASK] [MASK] $frac$ ${$ $1$ $plus$ $x$ $supscript$ ${$ $somenum$ $}$ $}$ ${$ $1$ $plus$ $x$ $}$ $d$ $x$
MASK[0] top candidates: ['$supscript$', '$subscript$', '$x$']
MASK[1] top candidates: ['$1$', '$0$', '$x$']
$vert$ $x$ $vert$ $supscript$ $4$ is maximized if [MASK] [MASK] if [MASK] $x$ [MASK] is maximized.
MASK[0] top candidates: ['and', ',', '&']
MASK[1] top candidates: ['only', 'not', 'also']
MASK[2] top candidates: ['$vert$', '$($', '${$']
MASK[3] top candidates: ['$vert$', 'itself', '$)$']
$($ $a$ $comma$ $b$ $comma$ $x$ $comma$ $n$ $)$ $equal$ [MASK] $a$ $comma$ $a$ $plus$ $1$ $comma$ $n$ $comma$ $n$ [MASK] is a counterexample (for any $n$ [MASK] $2$).
MASK[0] top candidates: ['$($', '$\\{$', '$[$']
MASK[1] top candidates: ['$)$', '$\\}$', '$]$']
MASK[2] top candidates: ['$geq$', '$ge$', '$gt$']
Ep#7/40, shard#4/6, save@26%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 8%, In[8, 251], loss=1.61: 20%|β–ˆβ–ˆ | 1410/6910 [29:33<1:49:38, 1.20s/batch] Ep#7/40, shard#4/6, save@26%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 8%, In[8, 251], loss=1.61: 20%|β–ˆβ–ˆ | 1411/6910 [29:33<1:53:14, 1.24s/batch] Ep#7/40, shard#4/6, save@27%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 54%, In[8, 512], loss=0.98: 20%|β–ˆβ–ˆ | 1411/6910 [29:34<1:53:14, 1.24s/batch] Ep#7/40, shard#4/6, save@27%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 54%, In[8, 512], loss=0.98: 20%|β–ˆβ–ˆ | 1412/6910 [29:34<1:53:17, 1.24s/batch] Ep#7/40, shard#4/6, save@28%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 378], loss=1.18: 20%|β–ˆβ–ˆ | 1412/6910 [29:35<1:53:17, 1.24s/batch] Ep#7/40, shard#4/6, save@28%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 378], loss=1.18: 20%|β–ˆβ–ˆ | 1413/6910 [29:35<1:46:42, 1.16s/batch] Ep#7/40, shard#4/6, save@29%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 480], loss=0.92: 20%|β–ˆβ–ˆ | 1413/6910 [29:37<1:46:42, 1.16s/batch] Ep#7/40, shard#4/6, save@29%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 480], loss=0.92: 20%|β–ˆβ–ˆ | 1414/6910 [29:37<1:59:50, 1.31s/batch] Ep#7/40, shard#4/6, save@30%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 67%, In[8, 302], loss=1.05: 20%|β–ˆβ–ˆ | 1414/6910 [29:38<1:59:50, 1.31s/batch] Ep#7/40, shard#4/6, save@30%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 67%, In[8, 302], loss=1.05: 20%|β–ˆβ–ˆ | 1415/6910 [29:38<1:48:39, 1.19s/batch] Ep#7/40, shard#4/6, save@31%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 287], loss=0.82: 20%|β–ˆβ–ˆ | 1415/6910 [29:39<1:48:39, 1.19s/batch] Ep#7/40, shard#4/6, save@31%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 96%, In[8, 287], loss=0.82: 20%|β–ˆβ–ˆ | 1416/6910 [29:39<1:54:05, 1.25s/batch] Ep#7/40, shard#4/6, save@32%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 63%, In[8, 370], loss=1.11: 20%|β–ˆβ–ˆ | 1416/6910 [29:40<1:54:05, 1.25s/batch] Ep#7/40, shard#4/6, save@32%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 63%, In[8, 370], loss=1.11: 21%|β–ˆβ–ˆ | 1417/6910 [29:40<1:55:02, 1.26s/batch] Ep#7/40, shard#4/6, save@33%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 69%, In[8, 395], loss=1.32: 21%|β–ˆβ–ˆ | 1417/6910 [29:41<1:55:02, 1.26s/batch] Ep#7/40, shard#4/6, save@33%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 69%, In[8, 395], loss=1.32: 21%|β–ˆβ–ˆ | 1418/6910 [29:41<1:48:20, 1.18s/batch] Ep#7/40, shard#4/6, save@34%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 69%, In[8, 210], loss=0.85: 21%|β–ˆβ–ˆ | 1418/6910 [29:43<1:48:20, 1.18s/batch] Ep#7/40, shard#4/6, save@34%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 69%, In[8, 210], loss=0.85: 21%|β–ˆβ–ˆ | 1419/6910 [29:43<1:48:03, 1.18s/batch] Ep#7/40, shard#4/6, save@35%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 73%, In[8, 275], loss=2.12: 21%|β–ˆβ–ˆ | 1419/6910 [29:44<1:48:03, 1.18s/batch] Ep#7/40, shard#4/6, save@35%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 73%, In[8, 275], loss=2.12: 21%|β–ˆβ–ˆ | 1420/6910 [29:44<1:52:24, 1.23s/batch] Ep#7/40, shard#4/6, save@36%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 75%, In[8, 296], loss=0.8: 21%|β–ˆβ–ˆ | 1420/6910 [29:45<1:52:24, 1.23s/batch] Ep#7/40, shard#4/6, save@36%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 75%, In[8, 296], loss=0.8: 21%|β–ˆβ–ˆ | 1421/6910 [29:45<2:00:06, 1.31s/batch] Ep#7/40, shard#4/6, save@37%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 321], loss=1.85: 21%|β–ˆβ–ˆ | 1421/6910 [29:47<2:00:06, 1.31s/batch] Ep#7/40, shard#4/6, save@37%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 321], loss=1.85: 21%|β–ˆβ–ˆ | 1422/6910 [29:47<1:58:06, 1.29s/batch] Ep#7/40, shard#4/6, save@38%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 88%, In[8, 469], loss=1.09: 21%|β–ˆβ–ˆ | 1422/6910 [29:48<1:58:06, 1.29s/batch] Ep#7/40, shard#4/6, save@38%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 88%, In[8, 469], loss=1.09: 21%|β–ˆβ–ˆ | 1423/6910 [29:48<1:53:39, 1.24s/batch] Ep#7/40, shard#4/6, save@39%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 239], loss=1.48: 21%|β–ˆβ–ˆ | 1423/6910 [29:49<1:53:39, 1.24s/batch] Ep#7/40, shard#4/6, save@39%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 239], loss=1.48: 21%|β–ˆβ–ˆ | 1424/6910 [29:49<1:48:32, 1.19s/batch] Ep#7/40, shard#4/6, save@40%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 420], loss=0.96: 21%|β–ˆβ–ˆ | 1424/6910 [29:50<1:48:32, 1.19s/batch] Ep#7/40, shard#4/6, save@40%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 100%, In[8, 420], loss=0.96: 21%|β–ˆβ–ˆ | 1425/6910 [29:50<1:56:41, 1.28s/batch] Ep#7/40, shard#4/6, save@41%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 94%, In[8, 331], loss=0.97: 21%|β–ˆβ–ˆ | 1425/6910 [29:53<1:56:41, 1.28s/batch] Ep#7/40, shard#4/6, save@41%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 94%, In[8, 331], loss=0.97: 21%|β–ˆβ–ˆ | 1426/6910 [29:53<2:20:50, 1.54s/batch]srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** STEP 25031358.0 ON blg4302 CANCELLED AT 2021-09-16T20:36:02 DUE TO TIME LIMIT ***
slurmstepd: error: *** JOB 25031358 ON blg4302 CANCELLED AT 2021-09-16T20:36:02 DUE TO TIME LIMIT ***
Ep#7/40, shard#4/6, save@41%691, 4 nodes, 2 x Tesla V100-SXM2-16GB: 94%, In[8, 331], loss=0.97: 21%|β–ˆβ–ˆ | 1426/6910 [29:53<1:54:55, 1.26s/batch]