yysung53 commited on
Commit
3595547
1 Parent(s): 646bf58

update instructions

Browse files
Files changed (1) hide show
  1. src/about.py +1 -4
src/about.py CHANGED
@@ -70,11 +70,8 @@ E.g. {'guess': 'Apple', 'confidence': 0.02}
70
 
71
  Reminder: If you are playing around with an extractive QA model already, HF QA models output the `score` already, so you only need to wrap the `score` to `confidence`.
72
 
73
- #### Customized retriever
74
- If you didn't submit anything for retriever, we will feed the `context` string with our pre-loaded context. However, we do provide the option for you to customize your retriever model with the dataset you wish to do retrieval. Please check the tutorial example for more details.
75
-
76
  ## Evaluation Metric
77
- In our Grounded QA task, we evaluate the QA model's reliability of their performance by measuring their calibration estimates where we consider the confidence of guess confidence values. To understand this concept better, we adopt the concept of "buzz" in Trivia Quiz, where buzz happens whenever the player is confident enough to predict the correct guess in the middle of a question. This also applies to our measurement of model calibration as we focus whether the model prediction probability matches its prediction accuracy. Our evaluation metric, `Average Expected Buzz`, quantifies the expected buzz confidence estimation.
78
 
79
  ## FAQ
80
  What if my system type is not specified here or not supported yet?
 
70
 
71
  Reminder: If you are playing around with an extractive QA model already, HF QA models output the `score` already, so you only need to wrap the `score` to `confidence`.
72
 
 
 
 
73
  ## Evaluation Metric
74
+ In our Adversarial Calibration QA task, we evaluate the QA model's reliability of their performance by measuring their calibration estimates where we consider the confidence of guess confidence values. To understand this concept better, we adopt the concept of "buzz" in Trivia Quiz, where buzz happens whenever the player is confident enough to predict the correct guess in the middle of a question. This also applies to our measurement of model calibration as we focus whether the model prediction probability matches its prediction accuracy. Our evaluation metric, `Average Expected Buzz`, quantifies the expected buzz confidence estimation.
75
 
76
  ## FAQ
77
  What if my system type is not specified here or not supported yet?