yeshpanovrustem commited on
Commit
93f7f50
1 Parent(s): 12b7fef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -50,3 +50,32 @@ As a result, the number of sentences, tokens, and named entities (NEs) in the cl
50
  | **EID** | 260 (0.29%) | 27 (0.24%) | 34 (0.31%)| 321 (0.29%) |
51
  | **FID** | 9 (0.01%) | 1 (0.01%)| 2 (0.02%)| 12 (0.01%) |
52
  |**Total**| **88,540 (100.00%)** | **11,067 (10.00%)** | **11,068 (10.00%)** | **110,675 (100%)** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  | **EID** | 260 (0.29%) | 27 (0.24%) | 34 (0.31%)| 321 (0.29%) |
51
  | **FID** | 9 (0.01%) | 1 (0.01%)| 2 (0.02%)| 12 (0.01%) |
52
  |**Total**| **88,540 (100.00%)** | **11,067 (10.00%)** | **11,068 (10.00%)** | **110,675 (100%)** |
53
+ ### Distribution of NEs across sets
54
+ | **NE Class** | **Train** | **Valid** | **Test** | **Total** |
55
+ |:---:| :---: | :---: | :---: | :---: |
56
+ | **ADAGE** | 153 (0.14%) | 19 (0.14%) | 17 (0.13%) | 189 (0.14%) |
57
+ | **ART** | 1,533 (1.44%) | 155 (1.18%) | 161 (1.23%) | 1,849 (1.40%) |
58
+ | **CARDINAL** | 23,135 (21.8%) | 2,878 (21.82%) | 2,789 (21.34%) | 28,802 (21.75%) |
59
+ | **CONTACT** | 159 (0.15%) | 18 (0.14%) | 20 (0.15%) | 197 (0.15%) |
60
+ | **DATE** | 20,006 (18.85%) | 2,603 (19.74%) | 2,584 (19.77%) | 25,193 (19.03%) |
61
+ | **DISEASE** | 1,022 (0.96%) | 121 (0.92%) | 119 (0.91%) | 1,262 (0.95%) |
62
+ | **EVENT** | 1,331 (1.25%) | 154 (1.17%) | 154 (1.18%) | 1,639 (1.24%) |
63
+ | **FACILITY** | 1,723 (1.62%) | 178 (1.35%) | 197 (1.51%) | 2,098 (1.58%) |
64
+ | **GPE** | 13,625 (12.84%) | 1,656 (12.56%) | 1,691 (12.94%) | 16,972 (12.82%) |
65
+ | **LANGUAGE** | 350 (0.33%) | 47 (0.36%) | 41 (0.31%) | 438 (0.33%) |
66
+ | **LAW** | 419 (0.39%) | 56 (0.42%) | 55 (0.42%) | 530 (0.40%) |
67
+ | **LOCATION** | 1,736 (1.64%) | 210 (1.59%) | 208 (1.59%) | 2,154 (1.63%) |
68
+ | **MISCELLANEOUS** | 191 (0.18%) | 26 (0.2%) | 26 (0.2%) | 243 (0.18%) |
69
+ | **MONEY** | 3,652 (3.44%) | 455 (3.45%) | 427 (3.27%) | 4,534 (3.42%) |
70
+ | **NON_HUMAN** | 6 (0.01%) | 1 (0.01%) | 1 (0.01%) | 8 (0.01%) |
71
+ | **NORP** | 2,929 (2.76%) | 374 (2.84%) | 368 (2.82%) | 3,671 (2.77%) |
72
+ | **ORDINAL** | 3,054 (2.88%) | 385 (2.92%) | 382 (2.92%) | 3,821 (2.89%) |
73
+ | **ORGANISATION** | 5,956 (5.61%) | 753 (5.71%) | 718 (5.49%) | 7,427 (5.61%) |
74
+ | **PERCENTAGE** | 3,357 (3.16%) | 437 (3.31%) | 462 (3.53%) | 4,256 (3.21%) |
75
+ | **PERSON** | 9,817 (9.25%) | 1,175 (8.91%) | 1,151 (8.81%) | 12,143 (9.17%) |
76
+ | **POSITION** | 4,844 (4.56%) | 587 (4.45%) | 597 (4.57%) | 6,028 (4.55%) |
77
+ | **PRODUCT** | 586 (0.55%) | 73 (0.55%) | 75 (0.57%) | 734 (0.55%) |
78
+ | **PROJECT** | 1,681 (1.58%) | 209 (1.58%) | 206 (1.58%) | 2,096 (1.58%) |
79
+ | **QUANTITY** | 3,063 (2.89%) | 411 (3.12%) | 403 (3.08%) | 3,877 (2.93%) |
80
+ | **TIME** | 1,820 (1.71%) | 208 (1.58%) | 220 (1.68%) | 2,248 (1.70%) |
81
+ | **Total** | **106,148 (100%)** | **13,189 (100%)** | **13,072 (100%)** | **132,409 (100%)** |