Include error counts in Popular Paper section
Browse files
README.md
CHANGED
@@ -82,53 +82,59 @@ Considering the following input annotated sentences:
|
|
82 |
The output for different modes and error_formats is:
|
83 |
```python
|
84 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='fair', error_format='count')
|
85 |
-
{
|
86 |
-
"trad_prec": 0.5,
|
87 |
-
|
88 |
-
|
89 |
-
"trad_prec": 0.0,
|
90 |
-
|
91 |
-
|
92 |
-
"trad_prec": 0.5,
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
```
|
107 |
|
108 |
```python
|
109 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='count')
|
110 |
-
{
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
-
|
|
|
|
|
|
|
119 |
```
|
120 |
|
121 |
```python
|
122 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='error_ratio')
|
123 |
-
{
|
124 |
-
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
|
130 |
-
|
131 |
-
|
|
|
|
|
|
|
132 |
```
|
133 |
|
134 |
#### Values from Popular Papers
|
@@ -143,6 +149,46 @@ A basic [DistilBERT model](https://huggingface.co/docs/transformers/model_doc/di
|
|
143 |
| seqeval strict | 0.2222 | 0.3425 | 0.0413 | 0.3598 | 0.0 | 0.0408 | 0.0 |
|
144 |
| seqeval relaxed | 0.2803 | 0.4124 | 0.0412 | 0.4105 | 0.0 | 0.1985 | 0.0 |
|
145 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
146 |
## Limitations and Bias
|
147 |
The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
|
148 |
label inputs (odd for Beginning, even for Inside and zero for Outside).
|
|
|
82 |
The output for different modes and error_formats is:
|
83 |
```python
|
84 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='fair', error_format='count')
|
85 |
+
{"PER": {"precision": 1.0,"recall": 0.5,"f1": 0.6666,
|
86 |
+
"trad_prec": 0.5,"trad_rec": 0.5,"trad_f1": 0.5,
|
87 |
+
"TP": 1,"FP": 0.0,"FN": 1.0,"LE": 0.0,"BE": 0.0,"LBE": 0.0},
|
88 |
+
"INT": {"precision": 0.0,"recall": 0.0,"f1": 0.0,
|
89 |
+
"trad_prec": 0.0,"trad_rec": 0.0,"trad_f1": 0.0,
|
90 |
+
"TP": 0,"FP": 0.0,"FN": 0.0,"LE": 0.0,"BE": 1.0,"LBE": 1.0},
|
91 |
+
"OUT": {"precision": 0.6666,"recall": 0.6666,"f1": 0.666,
|
92 |
+
"trad_prec": 0.5,"trad_rec": 0.5,"trad_f1": 0.5,
|
93 |
+
"TP": 1,"FP": 0.0,"FN": 0.0,"LE": 1.0,"BE": 0.0,"LBE": 0.0},
|
94 |
+
"overall_precision": 0.5714,
|
95 |
+
"overall_recall": 0.4444,
|
96 |
+
"overall_f1": 0.5,
|
97 |
+
"overall_trad_prec": 0.4,
|
98 |
+
"overall_trad_rec": 0.3333,
|
99 |
+
"overall_trad_f1": 0.3636,
|
100 |
+
"TP": 2,
|
101 |
+
"FP": 0.0,
|
102 |
+
"FN": 1.0,
|
103 |
+
"LE": 1.0,
|
104 |
+
"BE": 1.0,
|
105 |
+
"LBE": 1.0}
|
106 |
```
|
107 |
|
108 |
```python
|
109 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='count')
|
110 |
+
{"PER": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
|
111 |
+
"TP": 1,"FP": 1.0,"FN": 1.0},
|
112 |
+
"INT": {"precision": 0.0,"recall": 0.0,"f1": 0.0,
|
113 |
+
"TP": 0,"FP": 1.0,"FN": 2.0},
|
114 |
+
"OUT": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
|
115 |
+
"TP": 1,"FP": 1.0,"FN": 1.0},
|
116 |
+
"overall_precision": 0.4,
|
117 |
+
"overall_recall": 0.3333,
|
118 |
+
"overall_f1": 0.3636,
|
119 |
+
"TP": 2,
|
120 |
+
"FP": 3.0,
|
121 |
+
"FN": 4.0}
|
122 |
```
|
123 |
|
124 |
```python
|
125 |
>>> faireval.compute(predictions=y_pred, references=y_true, mode='traditional', error_format='error_ratio')
|
126 |
+
{"PER": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
|
127 |
+
"TP": 1,"FP": 0.1428,"FN": 0.1428},
|
128 |
+
"INT": {"precision": 0.0,"recall": 0.0,"f1": 0.0,
|
129 |
+
"TP": 0,"FP": 0.14285714285714285,"FN": 0.2857},
|
130 |
+
"OUT": {"precision": 0.5,"recall": 0.5,"f1": 0.5,
|
131 |
+
"TP": 1,"FP": 0.1428,"FN": 0.1428},
|
132 |
+
"overall_precision": 0.4,
|
133 |
+
"overall_recall": 0.3333,
|
134 |
+
"overall_f1": 0.3636,
|
135 |
+
"TP": 2,
|
136 |
+
"FP": 0.4285,
|
137 |
+
"FN": 0.5714}
|
138 |
```
|
139 |
|
140 |
#### Values from Popular Papers
|
|
|
149 |
| seqeval strict | 0.2222 | 0.3425 | 0.0413 | 0.3598 | 0.0 | 0.0408 | 0.0 |
|
150 |
| seqeval relaxed | 0.2803 | 0.4124 | 0.0412 | 0.4105 | 0.0 | 0.1985 | 0.0 |
|
151 |
|
152 |
+
The traditional count of evaluation parameters would be:
|
153 |
+
|
154 |
+
| | Overall | Location | Group | Person | Creative Work | Corporation | Product |
|
155 |
+
|----|---------|----------|-------|--------|---------------|-------------|---------|
|
156 |
+
| TP | 211 | 53 | 4 | 140 | 0 | 14 | 0 |
|
157 |
+
| FP | 353 | 42 | 42 | 174 | 1 | 70 | 0 |
|
158 |
+
| FN | 730 | 144 | 144 | 228 | 116 | 43 | 114 |
|
159 |
+
|
160 |
+
While the fair evaluation parameter count (`error_format='count'`) is:
|
161 |
+
|
162 |
+
| | Overall | Location | Group | Person | Creative Work | Corporation | Product |
|
163 |
+
|-----|---------|----------|-------|--------|---------------|-------------|---------|
|
164 |
+
| TP | 211 | 53 | 4 | 140 | 0 | 0 | 0 |
|
165 |
+
| FP | 125 | 9 | 21 | 62 | 1 | 32 | 0 |
|
166 |
+
| FN | 544 | 59 | 115 | 153 | 95 | 34 | 88 |
|
167 |
+
| BE | 105 | 11 | 4 | 87 | 0 | 3 | 0 |
|
168 |
+
| LE | 66 | 7 | 20 | 12 | 7 | 6 | 14 |
|
169 |
+
| LBE | 57 | 10 | 6 | 9 | 15 | 2 | 15 |
|
170 |
+
|
171 |
+
Thus, ratio of each fair error parameter with respect to the total number of errors (`error_format='error_ratio'`) is:
|
172 |
+
|
173 |
+
| | Overall | Location | Group | Person | Creative Work | Corporation | Product |
|
174 |
+
|-----|---------|----------|--------|--------|---------------|-------------|---------|
|
175 |
+
| FP | 13,94% | 1,00% | 2,34% | 6,91% | 0,11% | 3,57% | 0,00% |
|
176 |
+
| FN | 60,65% | 6,58% | 12,82% | 17,06% | 10,59% | 3,79% | 9,81% |
|
177 |
+
| BE | 11,71% | 1,23% | 0,45% | 9,70% | 0,00% | 0,33% | 0,00% |
|
178 |
+
| LE | 7,36% | 0,78% | 2,23% | 1,34% | 0,78% | 0,67% | 1,56% |
|
179 |
+
| LBE | 6,35% | 1,11% | 0,67% | 1,00% | 1,67% | 0,22% | 1,67% |
|
180 |
+
|
181 |
+
And the ratio of each fair parameter with respect to the total number of entities (`error_format='entity_ratio'`) is:
|
182 |
+
|
183 |
+
| | Overall | Location | Group | Person | Creative Work | Corporation | Product |
|
184 |
+
|-----|---------|----------|--------|--------|---------------|-------------|---------|
|
185 |
+
| TP | 19,04% | 4,78% | 0,36% | 12,64% | 0,00% | 0,00% | 0,00% |
|
186 |
+
| FP | 11,28% | 0,81% | 1,90% | 5,60% | 0,09% | 2,89% | 0,00% |
|
187 |
+
| FN | 49,10% | 5,32% | 10,38% | 13,81% | 8,57% | 3,07% | 7,94% |
|
188 |
+
| BE | 9,48% | 0,99% | 0,36% | 7,85% | 0,00% | 0,27% | 0,00% |
|
189 |
+
| LE | 5,96% | 0,63% | 1,81% | 1,08% | 0,63% | 0,54% | 1,26% |
|
190 |
+
| LBE | 5,14% | 0,90% | 0,54% | 0,81% | 1,35% | 0,18% | 1,35% |
|
191 |
+
|
192 |
## Limitations and Bias
|
193 |
The metric is restricted to the input schemes admitted by seqeval. For example, the application does not support numerical
|
194 |
label inputs (odd for Beginning, even for Inside and zero for Outside).
|