Spaces:
Runtime error
Runtime error
Update corpus.py (#2)
Browse files- Update corpus.py (ea3c6860db669ae4c5b2ea183561c25ea2730492)
Co-authored-by: Eliana Pastor <elianap@users.noreply.huggingface.co>
corpus.py
CHANGED
@@ -118,12 +118,20 @@ def body():
|
|
118 |
"""
|
119 |
**Legend**
|
120 |
|
|
|
121 |
- **AOPC Comprehensiveness** (aopc_compr) measures *comprehensiveness*, i.e., if the explanation captures all the tokens needed to make the prediction. Higher is better.
|
122 |
|
123 |
- **AOPC Sufficiency** (aopc_suff) measures *sufficiency*, i.e., if the relevant tokens in the explanation are sufficient to make the prediction. Lower is better.
|
124 |
|
125 |
- **Leave-On-Out TAU Correlation** (taucorr_loo) measures the Kendall rank correlation coefficient τ between the explanation and leave-one-out importances. Closer to 1 is better.
|
126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
127 |
See the paper for details.
|
128 |
"""
|
129 |
)
|
|
|
118 |
"""
|
119 |
**Legend**
|
120 |
|
121 |
+
**Faithfulness**
|
122 |
- **AOPC Comprehensiveness** (aopc_compr) measures *comprehensiveness*, i.e., if the explanation captures all the tokens needed to make the prediction. Higher is better.
|
123 |
|
124 |
- **AOPC Sufficiency** (aopc_suff) measures *sufficiency*, i.e., if the relevant tokens in the explanation are sufficient to make the prediction. Lower is better.
|
125 |
|
126 |
- **Leave-On-Out TAU Correlation** (taucorr_loo) measures the Kendall rank correlation coefficient τ between the explanation and leave-one-out importances. Closer to 1 is better.
|
127 |
|
128 |
+
**Plausibility**
|
129 |
+
- **AUPRC plausibility** (auprc_plau) is the area under the precision-recall curve (AUPRC) of the explanation and the rationale as ground truth. Higher is better.
|
130 |
+
|
131 |
+
- **Intersection-Over-Union (IOU)** (token_iou_plau) is the size of the overlap of the most relevant tokens of the explanation and the human rationale divided by the size of their union. Higher is better.
|
132 |
+
|
133 |
+
- **Token-level F1 score** (token_f1_plau) measures the F1 score among the most relevant tokens and the human rationale. Higher is better.
|
134 |
+
|
135 |
See the paper for details.
|
136 |
"""
|
137 |
)
|