ιδΏζ°
commited on
Commit
β’
03d0a26
1
Parent(s):
9f4f414
task
Browse files
app.py
CHANGED
@@ -107,7 +107,9 @@ elif page == "Methodology":
|
|
107 |
elif page == "Datasets":
|
108 |
st.header("Answer Generation")
|
109 |
st.markdown("""
|
110 |
-
We randomly sampled **100 instances** from **each** dataset as the question set and selected **7 different LLMs** to generate answers, forming the answer set.
|
|
|
|
|
111 |
""")
|
112 |
st.header("Human Annotation")
|
113 |
st.markdown("""
|
@@ -212,7 +214,7 @@ elif page == "Data and File format":
|
|
212 |
elif page == "Submit":
|
213 |
st.header("Submit")
|
214 |
st.markdown("""
|
215 |
-
We will be following a similar format as the ones used by most **TREC submissions**:
|
216 |
|
217 |
**taskId questionId answerId score rank**
|
218 |
|
@@ -229,11 +231,10 @@ We will be following a similar format as the ones used by most **TREC submission
|
|
229 |
π An example of the submission file content is [here](https://huggingface.co/spaces/THUIR/AEOLLM/blob/main/baseline_example/output/baseline1_chatglm3_6B.txt).
|
230 |
""")
|
231 |
elif page == "LeaderBoard":
|
232 |
-
st.header("LeaderBoard")
|
233 |
# # ζθΏ°
|
234 |
st.markdown("""
|
235 |
<p class='main-text'>
|
236 |
-
NTCIR-18 Automatic Evaluation Methods of LLMs (AEOLLM) Leaderboard.
|
237 |
</p>
|
238 |
""", unsafe_allow_html=True)
|
239 |
df = {
|
|
|
107 |
elif page == "Datasets":
|
108 |
st.header("Answer Generation")
|
109 |
st.markdown("""
|
110 |
+
We randomly sampled **100 instances** from **each** dataset as the question set and selected **7 different LLMs** to generate answers, forming the answer set.
|
111 |
+
|
112 |
+
As a result, each dataset produced 700 instances, totaling **2,800 instances across the four datasets**.
|
113 |
""")
|
114 |
st.header("Human Annotation")
|
115 |
st.markdown("""
|
|
|
214 |
elif page == "Submit":
|
215 |
st.header("Submit")
|
216 |
st.markdown("""
|
217 |
+
We will be following a similar format as the ones used by most **TREC submissions**: white space is used to separate columns. The width of the columns in the format is not important, but it is important to have exactly five columns per line with at least one space between the columns.
|
218 |
|
219 |
**taskId questionId answerId score rank**
|
220 |
|
|
|
231 |
π An example of the submission file content is [here](https://huggingface.co/spaces/THUIR/AEOLLM/blob/main/baseline_example/output/baseline1_chatglm3_6B.txt).
|
232 |
""")
|
233 |
elif page == "LeaderBoard":
|
|
|
234 |
# # ζθΏ°
|
235 |
st.markdown("""
|
236 |
<p class='main-text'>
|
237 |
+
π NTCIR-18 Automatic Evaluation Methods of LLMs (AEOLLM) task Leaderboard.
|
238 |
</p>
|
239 |
""", unsafe_allow_html=True)
|
240 |
df = {
|