Spaces:

umdclip
/

advcalibration

Running

App Files Files Community

Yu (Hope) Hou commited on May 8, 2024

Commit

224cb2c

1 Parent(s): 6820a49

update FE display

Browse files

Files changed (1) hide show

src/about.py +3 -9

src/about.py CHANGED Viewed

@@ -71,26 +71,20 @@ E.g. {'guess': 'Apple', 'confidence': 0.02}
 Reminder: If you are playing around with an extractive QA model already, HF QA models output the `score` already, so you only need to wrap the `score` to `confidence`.
 #### Customized retriever
-If you didn’t submit anything for retriever, we will feed the `context` string with our pre-loaded context. However, we do provide the option for you to customize your retriever model with the dataset you wish to do retrieval. Please check the tutorial example for more details.
 ## Evaluation Metric
-For each question in the test set, we parsed it into multiple runs and fed each run as the question to your pipeline. Then we use the confidence scores calculated for all runs to get the Buzz Confidence.
 ## FAQ
 What if my system type is not specified here or not supported yet?
 - Please have a private post to instructors so we could check how we could adapt the leaderboard for your purpose. Thanks!
-I don’t understand where I could start to build a QA system for submission.
 - Please check our submission tutorials. From there, you could fine-tune or do anything above the base models.
 I want to use API-based QA systems for submission, like GPT4. What should I do?
 - We don't support API-based models now but you could train your model with the GPT cache we provided: https://github.com/Pinafore/nlp-hw/tree/master/models.
-I want to test my model locally before submission. How could I do that?
-- In addition to tutorial test, please also ensure your model could be loaded with the below code, so it could pass the frontend check.
-```
-AutoConfig.from_pretrained(model_name, revision="main", trust_remote_code=True/False, token=ACCESS_TOKEN)
-```
 """
 EVALUATION_QUEUE_TEXT = """

 Reminder: If you are playing around with an extractive QA model already, HF QA models output the `score` already, so you only need to wrap the `score` to `confidence`.
 #### Customized retriever
+If you didn't submit anything for retriever, we will feed the `context` string with our pre-loaded context. However, we do provide the option for you to customize your retriever model with the dataset you wish to do retrieval. Please check the tutorial example for more details.
 ## Evaluation Metric
+In our Grounded QA task, we evaluate the QA model's reliability of their performance by measuring their calibration estimates where we consider the confidence of guess confidence values. To understand this concept better, we adopt the concept of "buzz" in Trivia Quiz, where buzz happens whenever the player is confident enough to predict the correct guess in the middle of a question. This also applies to our measurement of model calibration as we focus whether the model prediction probability matches its prediction accuracy. Our evaluation metric, `Average Expected Buzz`, quantifies the expected buzz confidence estimation.
 ## FAQ
 What if my system type is not specified here or not supported yet?
 - Please have a private post to instructors so we could check how we could adapt the leaderboard for your purpose. Thanks!
+I don't understand where I could start to build a QA system for submission.
 - Please check our submission tutorials. From there, you could fine-tune or do anything above the base models.
 I want to use API-based QA systems for submission, like GPT4. What should I do?
 - We don't support API-based models now but you could train your model with the GPT cache we provided: https://github.com/Pinafore/nlp-hw/tree/master/models.
 """
 EVALUATION_QUEUE_TEXT = """