Spaces:

umdclip
/

advcalibration

Running

yysung53 commited on Sep 26

Commit

3595547

•

1 Parent(s): 646bf58

update instructions

Files changed (1) hide show

src/about.py CHANGED Viewed

@@ -70,11 +70,8 @@ E.g. {'guess': 'Apple', 'confidence': 0.02}
 Reminder: If you are playing around with an extractive QA model already, HF QA models output the `score` already, so you only need to wrap the `score` to `confidence`.
-#### Customized retriever
-If you didn't submit anything for retriever, we will feed the `context` string with our pre-loaded context. However, we do provide the option for you to customize your retriever model with the dataset you wish to do retrieval. Please check the tutorial example for more details.
 ## Evaluation Metric
-In our Grounded QA task, we evaluate the QA model's reliability of their performance by measuring their calibration estimates where we consider the confidence of guess confidence values. To understand this concept better, we adopt the concept of "buzz" in Trivia Quiz, where buzz happens whenever the player is confident enough to predict the correct guess in the middle of a question. This also applies to our measurement of model calibration as we focus whether the model prediction probability matches its prediction accuracy. Our evaluation metric, `Average Expected Buzz`, quantifies the expected buzz confidence estimation.
 ## FAQ
 What if my system type is not specified here or not supported yet?

 Reminder: If you are playing around with an extractive QA model already, HF QA models output the `score` already, so you only need to wrap the `score` to `confidence`.
 ## Evaluation Metric
+In our Adversarial Calibration QA task, we evaluate the QA model's reliability of their performance by measuring their calibration estimates where we consider the confidence of guess confidence values. To understand this concept better, we adopt the concept of "buzz" in Trivia Quiz, where buzz happens whenever the player is confident enough to predict the correct guess in the middle of a question. This also applies to our measurement of model calibration as we focus whether the model prediction probability matches its prediction accuracy. Our evaluation metric, `Average Expected Buzz`, quantifies the expected buzz confidence estimation.
 ## FAQ
 What if my system type is not specified here or not supported yet?