g8a9 elianap commited on
Commit
bb4d707
1 Parent(s): ca954b1

Update corpus.py (#2)

Browse files

- Update corpus.py (ea3c6860db669ae4c5b2ea183561c25ea2730492)


Co-authored-by: Eliana Pastor <elianap@users.noreply.huggingface.co>

Files changed (1) hide show
  1. corpus.py +8 -0
corpus.py CHANGED
@@ -118,12 +118,20 @@ def body():
118
  """
119
  **Legend**
120
 
 
121
  - **AOPC Comprehensiveness** (aopc_compr) measures *comprehensiveness*, i.e., if the explanation captures all the tokens needed to make the prediction. Higher is better.
122
 
123
  - **AOPC Sufficiency** (aopc_suff) measures *sufficiency*, i.e., if the relevant tokens in the explanation are sufficient to make the prediction. Lower is better.
124
 
125
  - **Leave-On-Out TAU Correlation** (taucorr_loo) measures the Kendall rank correlation coefficient τ between the explanation and leave-one-out importances. Closer to 1 is better.
126
 
 
 
 
 
 
 
 
127
  See the paper for details.
128
  """
129
  )
118
  """
119
  **Legend**
120
 
121
+ **Faithfulness**
122
  - **AOPC Comprehensiveness** (aopc_compr) measures *comprehensiveness*, i.e., if the explanation captures all the tokens needed to make the prediction. Higher is better.
123
 
124
  - **AOPC Sufficiency** (aopc_suff) measures *sufficiency*, i.e., if the relevant tokens in the explanation are sufficient to make the prediction. Lower is better.
125
 
126
  - **Leave-On-Out TAU Correlation** (taucorr_loo) measures the Kendall rank correlation coefficient τ between the explanation and leave-one-out importances. Closer to 1 is better.
127
 
128
+ **Plausibility**
129
+ - **AUPRC plausibility** (auprc_plau) is the area under the precision-recall curve (AUPRC) of the explanation and the rationale as ground truth. Higher is better.
130
+
131
+ - **Intersection-Over-Union (IOU)** (token_iou_plau) is the size of the overlap of the most relevant tokens of the explanation and the human rationale divided by the size of their union. Higher is better.
132
+
133
+ - **Token-level F1 score** (token_f1_plau) measures the F1 score among the most relevant tokens and the human rationale. Higher is better.
134
+
135
  See the paper for details.
136
  """
137
  )