Spaces:
Running
Running
Yu (Hope) Hou
commited on
Commit
•
224cb2c
1
Parent(s):
6820a49
update FE display
Browse files- src/about.py +3 -9
src/about.py
CHANGED
@@ -71,26 +71,20 @@ E.g. {'guess': 'Apple', 'confidence': 0.02}
|
|
71 |
Reminder: If you are playing around with an extractive QA model already, HF QA models output the `score` already, so you only need to wrap the `score` to `confidence`.
|
72 |
|
73 |
#### Customized retriever
|
74 |
-
If you didn
|
75 |
|
76 |
## Evaluation Metric
|
77 |
-
|
78 |
|
79 |
## FAQ
|
80 |
What if my system type is not specified here or not supported yet?
|
81 |
- Please have a private post to instructors so we could check how we could adapt the leaderboard for your purpose. Thanks!
|
82 |
|
83 |
-
I don
|
84 |
- Please check our submission tutorials. From there, you could fine-tune or do anything above the base models.
|
85 |
|
86 |
I want to use API-based QA systems for submission, like GPT4. What should I do?
|
87 |
- We don't support API-based models now but you could train your model with the GPT cache we provided: https://github.com/Pinafore/nlp-hw/tree/master/models.
|
88 |
-
|
89 |
-
I want to test my model locally before submission. How could I do that?
|
90 |
-
- In addition to tutorial test, please also ensure your model could be loaded with the below code, so it could pass the frontend check.
|
91 |
-
```
|
92 |
-
AutoConfig.from_pretrained(model_name, revision="main", trust_remote_code=True/False, token=ACCESS_TOKEN)
|
93 |
-
```
|
94 |
"""
|
95 |
|
96 |
EVALUATION_QUEUE_TEXT = """
|
|
|
71 |
Reminder: If you are playing around with an extractive QA model already, HF QA models output the `score` already, so you only need to wrap the `score` to `confidence`.
|
72 |
|
73 |
#### Customized retriever
|
74 |
+
If you didn't submit anything for retriever, we will feed the `context` string with our pre-loaded context. However, we do provide the option for you to customize your retriever model with the dataset you wish to do retrieval. Please check the tutorial example for more details.
|
75 |
|
76 |
## Evaluation Metric
|
77 |
+
In our Grounded QA task, we evaluate the QA model's reliability of their performance by measuring their calibration estimates where we consider the confidence of guess confidence values. To understand this concept better, we adopt the concept of "buzz" in Trivia Quiz, where buzz happens whenever the player is confident enough to predict the correct guess in the middle of a question. This also applies to our measurement of model calibration as we focus whether the model prediction probability matches its prediction accuracy. Our evaluation metric, `Average Expected Buzz`, quantifies the expected buzz confidence estimation.
|
78 |
|
79 |
## FAQ
|
80 |
What if my system type is not specified here or not supported yet?
|
81 |
- Please have a private post to instructors so we could check how we could adapt the leaderboard for your purpose. Thanks!
|
82 |
|
83 |
+
I don't understand where I could start to build a QA system for submission.
|
84 |
- Please check our submission tutorials. From there, you could fine-tune or do anything above the base models.
|
85 |
|
86 |
I want to use API-based QA systems for submission, like GPT4. What should I do?
|
87 |
- We don't support API-based models now but you could train your model with the GPT cache we provided: https://github.com/Pinafore/nlp-hw/tree/master/models.
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
"""
|
89 |
|
90 |
EVALUATION_QUEUE_TEXT = """
|