1-800-BAD-CODE commited on
Commit
c96c930
1 Parent(s): 38ea57a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +240 -19
README.md CHANGED
@@ -173,8 +173,30 @@ This is also a base-sized model with many languages and many tasks, so capacity
173
 
174
  # Evaluation
175
  In these metrics, keep in mind that
176
- 1. That data is noisy
177
- 2. Sentence boundaries and true-casing is conditioned on predicted punctuation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
 
179
  <details>
180
  <summary>English</summary>
@@ -182,10 +204,10 @@ In these metrics, keep in mind that
182
  ```
183
  punct_post test report:
184
  label precision recall f1 support
185
- <NULL> (label_id: 0) 98.71 98.69 98.70 107750
186
- . (label_id: 1) 87.82 88.89 88.36 6005
187
- , (label_id: 2) 67.90 67.24 67.57 3571
188
- ? (label_id: 3) 80.51 78.19 79.33 486
189
  ? (label_id: 4) 0.00 0.00 0.00 0
190
  , (label_id: 5) 0.00 0.00 0.00 0
191
  。 (label_id: 6) 0.00 0.00 0.00 0
@@ -199,26 +221,225 @@ punct_post test report:
199
  ፣ (label_id: 14) 0.00 0.00 0.00 0
200
  ፧ (label_id: 15) 0.00 0.00 0.00 0
201
  -------------------
202
- micro avg 97.15 97.15 97.15 117812
203
- macro avg 83.74 83.25 83.49 117812
204
- weighted avg 97.15 97.15 97.15 117812
205
 
206
  cap test report:
207
  label precision recall f1 support
208
- LOWER (label_id: 0) 99.62 99.49 99.56 362399
209
- UPPER (label_id: 1) 89.11 91.75 90.41 16506
210
  -------------------
211
- micro avg 99.15 99.15 99.15 378905
212
- macro avg 94.37 95.62 94.98 378905
213
- weighted avg 99.17 99.15 99.16 378905
214
 
215
  seg test report:
216
  label precision recall f1 support
217
- NOSTOP (label_id: 0) 99.29 99.43 99.36 111466
218
- FULLSTOP (label_id: 1) 89.69 87.49 88.58 6346
219
  -------------------
220
- micro avg 98.78 98.78 98.78 117812
221
- macro avg 94.49 93.46 93.97 117812
222
- weighted avg 98.77 98.78 98.78 117812
 
 
 
 
 
 
 
223
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
224
  </details>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
173
 
174
  # Evaluation
175
  In these metrics, keep in mind that
176
+ 1. The data is noisy
177
+ 2. Sentence boundaries and true-casing are conditioned on predicted punctuation, which is the most difficult task and sometimes incorrect.
178
+ When conditioning on reference punctuation, true-casing and SBD is practically 100% for most languages.
179
+ 4. Punctuation can be subjective. E.g.,
180
+
181
+ `Hola mundo, ¿cómo estás?`
182
+
183
+ or
184
+
185
+ `Hola mundo. ¿Cómo estás?`
186
+
187
+ When the sentences are longer and more practical, these ambiguities abound and affect all 3 analytics.
188
+
189
+
190
+ ## Selected Language Evaluation Reports
191
+ Each test example was generated using the following procedure:
192
+
193
+ 1. Concatenate 5 random sentences
194
+ 2. Lower-case the concatenated sentence
195
+ 3. Remove all punctuation
196
+
197
+ The data is a held-out portion of News Crawl, which has been deduplicated.
198
+ 2,000 lines of data per language was used, generating 2,000 unique examples of 5 sentences each.
199
+ The last 4 sentences of each example were randomly sampled from the 2,000 and may be duplicated.
200
 
201
  <details>
202
  <summary>English</summary>
 
204
  ```
205
  punct_post test report:
206
  label precision recall f1 support
207
+ <NULL> (label_id: 0) 98.71 98.66 98.68 156605
208
+ . (label_id: 1) 87.72 88.85 88.28 8752
209
+ , (label_id: 2) 68.06 67.81 67.93 5216
210
+ ? (label_id: 3) 79.38 77.20 78.27 693
211
  ? (label_id: 4) 0.00 0.00 0.00 0
212
  , (label_id: 5) 0.00 0.00 0.00 0
213
  。 (label_id: 6) 0.00 0.00 0.00 0
 
221
  ፣ (label_id: 14) 0.00 0.00 0.00 0
222
  ፧ (label_id: 15) 0.00 0.00 0.00 0
223
  -------------------
224
+ micro avg 97.13 97.13 97.13 171266
225
+ macro avg 83.46 83.13 83.29 171266
226
+ weighted avg 97.13 97.13 97.13 171266
227
 
228
  cap test report:
229
  label precision recall f1 support
230
+ LOWER (label_id: 0) 99.63 99.49 99.56 526612
231
+ UPPER (label_id: 1) 89.19 91.84 90.50 24161
232
  -------------------
233
+ micro avg 99.15 99.15 99.15 550773
234
+ macro avg 94.41 95.66 95.03 550773
235
+ weighted avg 99.17 99.15 99.16 550773
236
 
237
  seg test report:
238
  label precision recall f1 support
239
+ NOSTOP (label_id: 0) 99.37 99.42 99.39 162044
240
+ FULLSTOP (label_id: 1) 89.75 88.84 89.29 9222
241
  -------------------
242
+ micro avg 98.85 98.85 98.85 171266
243
+ macro avg 94.56 94.13 94.34 171266
244
+ weighted avg 98.85 98.85 98.85 171266
245
+ ```
246
+ </details>
247
+
248
+
249
+ <details>
250
+ <summary>Spanish</summary>
251
+
252
  ```
253
+ punct_pre test report:
254
+ label precision recall f1 support
255
+ <NULL> (label_id: 0) 99.94 99.92 99.93 185535
256
+ ¿ (label_id: 1) 55.01 64.86 59.53 296
257
+ -------------------
258
+ micro avg 99.86 99.86 99.86 185831
259
+ macro avg 77.48 82.39 79.73 185831
260
+ weighted avg 99.87 99.86 99.87 185831
261
+
262
+ punct_post test report:
263
+ label precision recall f1 support
264
+ <NULL> (label_id: 0) 98.74 98.86 98.80 170282
265
+ . (label_id: 1) 90.07 89.58 89.82 9959
266
+ , (label_id: 2) 68.33 67.00 67.66 5300
267
+ ? (label_id: 3) 70.25 58.62 63.91 290
268
+ ? (label_id: 4) 0.00 0.00 0.00 0
269
+ , (label_id: 5) 0.00 0.00 0.00 0
270
+ 。 (label_id: 6) 0.00 0.00 0.00 0
271
+ 、 (label_id: 7) 0.00 0.00 0.00 0
272
+ ・ (label_id: 8) 0.00 0.00 0.00 0
273
+ । (label_id: 9) 0.00 0.00 0.00 0
274
+ ؟ (label_id: 10) 0.00 0.00 0.00 0
275
+ ، (label_id: 11) 0.00 0.00 0.00 0
276
+ ; (label_id: 12) 0.00 0.00 0.00 0
277
+ ። (label_id: 13) 0.00 0.00 0.00 0
278
+ ፣ (label_id: 14) 0.00 0.00 0.00 0
279
+ ፧ (label_id: 15) 0.00 0.00 0.00 0
280
+ -------------------
281
+ micro avg 97.39 97.39 97.39 185831
282
+ macro avg 81.84 78.51 80.05 185831
283
+ weighted avg 97.36 97.39 97.37 185831
284
+
285
+ cap test report:
286
+ label precision recall f1 support
287
+ LOWER (label_id: 0) 99.62 99.60 99.61 555041
288
+ UPPER (label_id: 1) 90.60 91.06 90.83 23538
289
+ -------------------
290
+ micro avg 99.25 99.25 99.25 578579
291
+ macro avg 95.11 95.33 95.22 578579
292
+ weighted avg 99.25 99.25 99.25 578579
293
+
294
+ [NeMo I 2023-02-22 17:24:04 punct_cap_seg_model:427] seg test report:
295
+ label precision recall f1 support
296
+ NOSTOP (label_id: 0) 99.44 99.54 99.49 175908
297
+ FULLSTOP (label_id: 1) 91.68 89.98 90.82 9923
298
+ -------------------
299
+ micro avg 99.03 99.03 99.03 185831
300
+ macro avg 95.56 94.76 95.16 185831
301
+ weighted avg 99.02 99.03 99.02 185831
302
+ ```
303
  </details>
304
+
305
+ <details>
306
+ <summary>Chinese</summary>
307
+
308
+ ```
309
+ punct_post test report:
310
+ label precision recall f1 support
311
+ <NULL> (label_id: 0) 98.82 97.34 98.07 147920
312
+ . (label_id: 1) 0.00 0.00 0.00 0
313
+ , (label_id: 2) 0.00 0.00 0.00 0
314
+ ? (label_id: 3) 0.00 0.00 0.00 0
315
+ ? (label_id: 4) 85.77 80.71 83.16 560
316
+ , (label_id: 5) 59.88 78.02 67.75 6901
317
+ 。 (label_id: 6) 92.50 93.92 93.20 10988
318
+ 、 (label_id: 7) 0.00 0.00 0.00 0
319
+ ・ (label_id: 8) 0.00 0.00 0.00 0
320
+ । (label_id: 9) 0.00 0.00 0.00 0
321
+ ؟ (label_id: 10) 0.00 0.00 0.00 0
322
+ ، (label_id: 11) 0.00 0.00 0.00 0
323
+ ; (label_id: 12) 0.00 0.00 0.00 0
324
+ ። (label_id: 13) 0.00 0.00 0.00 0
325
+ ፣ (label_id: 14) 0.00 0.00 0.00 0
326
+ ፧ (label_id: 15) 0.00 0.00 0.00 0
327
+ -------------------
328
+ micro avg 96.25 96.25 96.25 166369
329
+ macro avg 84.24 87.50 85.55 166369
330
+ weighted avg 96.75 96.25 96.45 166369
331
+
332
+ cap test report:
333
+ label precision recall f1 support
334
+ LOWER (label_id: 0) 97.07 92.39 94.67 394
335
+ UPPER (label_id: 1) 70.59 86.75 77.84 83
336
+ -------------------
337
+ micro avg 91.40 91.40 91.40 477
338
+ macro avg 83.83 89.57 86.25 477
339
+ weighted avg 92.46 91.40 91.74 477
340
+
341
+ seg test report:
342
+ label precision recall f1 support
343
+ NOSTOP (label_id: 0) 99.58 99.53 99.56 156369
344
+ FULLSTOP (label_id: 1) 92.77 93.50 93.13 10000
345
+ -------------------
346
+ micro avg 99.17 99.17 99.17 166369
347
+ macro avg 96.18 96.52 96.35 166369
348
+ weighted avg 99.17 99.17 99.17 166369
349
+ ```
350
+ </details>
351
+
352
+
353
+ <details>
354
+ <summary>Hindi</summary>
355
+
356
+ ```
357
+ punct_post test report:
358
+ label precision recall f1 support
359
+ <NULL> (label_id: 0) 99.58 99.59 99.59 176743
360
+ . (label_id: 1) 0.00 0.00 0.00 0
361
+ , (label_id: 2) 68.32 65.23 66.74 1815
362
+ ? (label_id: 3) 60.27 44.90 51.46 98
363
+ ? (label_id: 4) 0.00 0.00 0.00 0
364
+ , (label_id: 5) 0.00 0.00 0.00 0
365
+ 。 (label_id: 6) 0.00 0.00 0.00 0
366
+ 、 (label_id: 7) 0.00 0.00 0.00 0
367
+ ・ (label_id: 8) 0.00 0.00 0.00 0
368
+ । (label_id: 9) 96.45 97.43 96.94 10136
369
+ ؟ (label_id: 10) 0.00 0.00 0.00 0
370
+ ، (label_id: 11) 0.00 0.00 0.00 0
371
+ ; (label_id: 12) 0.00 0.00 0.00 0
372
+ ። (label_id: 13) 0.00 0.00 0.00 0
373
+ ፣ (label_id: 14) 0.00 0.00 0.00 0
374
+ ፧ (label_id: 15) 0.00 0.00 0.00 0
375
+ -------------------
376
+ micro avg 99.11 99.11 99.11 188792
377
+ macro avg 81.16 76.79 78.68 188792
378
+ weighted avg 99.10 99.11 99.10 188792
379
+
380
+ cap test report:
381
+ label precision recall f1 support
382
+ LOWER (label_id: 0) 98.25 95.06 96.63 708
383
+ UPPER (label_id: 1) 89.46 96.12 92.67 309
384
+ -------------------
385
+ micro avg 95.38 95.38 95.38 1017
386
+ macro avg 93.85 95.59 94.65 1017
387
+ weighted avg 95.58 95.38 95.42 1017
388
+
389
+ seg test report:
390
+ label precision recall f1 support
391
+ NOSTOP (label_id: 0) 99.87 99.85 99.86 178892
392
+ FULLSTOP (label_id: 1) 97.38 97.58 97.48 9900
393
+ -------------------
394
+ micro avg 99.74 99.74 99.74 188792
395
+ macro avg 98.62 98.72 98.67 188792
396
+ weighted avg 99.74 99.74 99.74 188792
397
+ ```
398
+ </details>
399
+
400
+ <details>
401
+ <summary>Amharic</summary>
402
+
403
+ ```
404
+ punct_post test report:
405
+ label precision recall f1 support
406
+ <NULL> (label_id: 0) 99.58 99.42 99.50 236298
407
+ . (label_id: 1) 0.00 0.00 0.00 0
408
+ , (label_id: 2) 0.00 0.00 0.00 0
409
+ ? (label_id: 3) 0.00 0.00 0.00 0
410
+ ? (label_id: 4) 0.00 0.00 0.00 0
411
+ , (label_id: 5) 0.00 0.00 0.00 0
412
+ 。 (label_id: 6) 0.00 0.00 0.00 0
413
+ 、 (label_id: 7) 0.00 0.00 0.00 0
414
+ ・ (label_id: 8) 0.00 0.00 0.00 0
415
+ । (label_id: 9) 0.00 0.00 0.00 0
416
+ ؟ (label_id: 10) 0.00 0.00 0.00 0
417
+ ، (label_id: 11) 0.00 0.00 0.00 0
418
+ ; (label_id: 12) 0.00 0.00 0.00 0
419
+ ። (label_id: 13) 89.79 95.24 92.44 9169
420
+ ፣ (label_id: 14) 66.85 56.58 61.29 1504
421
+ ፧ (label_id: 15) 67.67 83.72 74.84 215
422
+ -------------------
423
+ micro avg 98.99 98.99 98.99 247186
424
+ macro avg 80.97 83.74 82.02 247186
425
+ weighted avg 98.99 98.99 98.98 247186
426
+
427
+ cap test report:
428
+ label precision recall f1 support
429
+ LOWER (label_id: 0) 96.65 99.78 98.19 1360
430
+ UPPER (label_id: 1) 98.90 85.13 91.50 316
431
+ -------------------
432
+ micro avg 97.02 97.02 97.02 1676
433
+ macro avg 97.77 92.45 94.84 1676
434
+ weighted avg 97.08 97.02 96.93 1676
435
+
436
+ seg test report:
437
+ label precision recall f1 support
438
+ NOSTOP (label_id: 0) 99.85 99.74 99.80 239845
439
+ FULLSTOP (label_id: 1) 91.72 95.25 93.45 7341
440
+ -------------------
441
+ micro avg 99.60 99.60 99.60 247186
442
+ macro avg 95.79 97.49 96.62 247186
443
+ weighted avg 99.61 99.60 99.61 247186
444
+ ```
445
+ </details>