Spaces:

THUIR
/

AEOLLM

Running

App Files Files Community

陈俊杰 commited on Sep 3

Commit

221547a

•

1 Parent(s): 88915d0

cjj-leader

Browse files

Files changed (1) hide show

app.py +22 -12

app.py CHANGED Viewed

@@ -14,8 +14,8 @@ with st.sidebar:
     page = option_menu(
         "Navigation",
         ["LeaderBoard", "Introduction", "Methodology", "Datasets", "Important Dates",
-         "Evaluation Measures", "Data and File format", "Submit", "Organisers", "References"],
-        icons=['trophy', 'house', 'book', 'database', 'calendar', 'clipboard', 'file', 'upload', 'people', 'book'],
         menu_icon="cast",
         default_index=0,
         styles={
@@ -143,8 +143,8 @@ elif page == "Important Dates":
 <br />
 Before the Formal run begins (before Jan 15, 2025), we will release the reserved set. Participants need to submit their results for the reserved set before the Formal run ends (before Feb 1, 2025).</p>
 """,unsafe_allow_html=True)
-elif page == "Evaluation Measures":
-    st.header("Evaluation Measures")
     st.markdown("""
 - **Acc(Accuracy):** The proportion of identical preference results between the model and human annotations. Specifically, we first convert individual scores (ranks) into pairwise preferences and then calculate consistency with human annotations.
@@ -183,22 +183,30 @@ elif page == "Data and File format":
 elif page == "Submit":
     st.header("Submit")
     st.markdown("""
-Please organize the answers in a **txt** file, where each line includes: **taskId questionId answerId score rank**.
-Finally, name the file as **teamId_methods.txt** and submit it through the link below: [https://forms.gle/ctJD5tvZkYcCw7Kz9](https://forms.gle/ctJD5tvZkYcCw7Kz9)
 Each team can submit up to 5 times per day, and only the latest submission will be considered.
-The Leaderboard will be updated daily around 24:00 Beijing Time.
-A baseline example can be found in the [baseline_example](https://huggingface.co/spaces/THUIR/AEOLLM/tree/main/baseline_example) folder, where the output folder provides an [example](https://huggingface.co/spaces/THUIR/AEOLLM/blob/main/baseline_example/output/baseline1_chatglm3_6B.txt) of the submission file content.
     """)
 elif page == "LeaderBoard":
     st.header("LeaderBoard")
     # # 描述
     st.markdown("""
 <p class='main-text'>
-NTCIR-18 Automatic Evaluation Methods of LLMs (AEOLLM) Leaderboard. <br/>To submit, refer to the "Submit" section in the left-hand navigation bar.🤗 <br/>Refer to other sections in the navigation bar for details on methodology, datasets, important dates, and evaluation measures.
 </p>
     """, unsafe_allow_html=True)
     df = {
@@ -309,9 +317,11 @@ NTCIR-18 Automatic Evaluation Methods of LLMs (AEOLLM) Leaderboard. <br/>To subm
     st.dataframe(df,use_container_width=True)
     st.markdown("""
-The Leaderboard will be updated daily around 24:00 Beijing Time.
-A baseline example can be found in the [baseline_example](https://huggingface.co/spaces/THUIR/AEOLLM/tree/main/baseline_example) folder.
 """)
     # 获取北京时间
     time_placeholder = st.empty()

     page = option_menu(
         "Navigation",
         ["LeaderBoard", "Introduction", "Methodology", "Datasets", "Important Dates",
+         "Evaluation Metrics", "Submit", "Organisers", "References"],
+        icons=['trophy', 'house', 'book', 'database', 'calendar', 'clipboard', 'upload', 'people', 'book'],
         menu_icon="cast",
         default_index=0,
         styles={
 <br />
 Before the Formal run begins (before Jan 15, 2025), we will release the reserved set. Participants need to submit their results for the reserved set before the Formal run ends (before Feb 1, 2025).</p>
 """,unsafe_allow_html=True)
+elif page == "Evaluation Metrics":
+    st.header("Evaluation Metrics")
     st.markdown("""
 - **Acc(Accuracy):** The proportion of identical preference results between the model and human annotations. Specifically, we first convert individual scores (ranks) into pairwise preferences and then calculate consistency with human annotations.
 elif page == "Submit":
     st.header("Submit")
     st.markdown("""
+We will be following a similar format as the ones used by most **TREC submissions**, which is repeated below.
+White space is used to separate columns. The width of the columns in the format is not important, but it is important to have exactly five columns per line with at least one space between the columns.
+**taskId  questionId  answerId  score  rank**
+- the first column is the taskeId (index different tasks)
+- the second column is questionId (index different questions in the same task)
+- the third column is answerId (index the answer provided by different LLMs to the same question)
+- the fourth column is score (index the score to the answer given by participants)
+- the fifth column is rank (index the rank of the answer within all answers to the same question)
+Please organize the answers in a **txt** file, name the file as **teamId_methods.txt** and submit it through the link below: [https://forms.gle/ctJD5tvZkYcCw7Kz9](https://forms.gle/ctJD5tvZkYcCw7Kz9)
 Each team can submit up to 5 times per day, and only the latest submission will be considered.
+An example of the submission file content is [here](https://huggingface.co/spaces/THUIR/AEOLLM/blob/main/baseline_example/output/baseline1_chatglm3_6B.txt).
     """)
 elif page == "LeaderBoard":
     st.header("LeaderBoard")
     # # 描述
     st.markdown("""
 <p class='main-text'>
+NTCIR-18 Automatic Evaluation Methods of LLMs (AEOLLM) Leaderboard.
 </p>
     """, unsafe_allow_html=True)
     df = {
     st.dataframe(df,use_container_width=True)
     st.markdown("""
+To submit, refer to the "Submit" section in the left-hand navigation bar.🤗 A baseline example can be found in the [baseline_example](https://huggingface.co/spaces/THUIR/AEOLLM/tree/main/baseline_example) folder.
+Refer to other sections in the navigation bar for details on evaluation metrics, datasets, important dates and methodology.
+The Leaderboard will be updated daily around 24:00 Beijing Time.
 """)
     # 获取北京时间
     time_placeholder = st.empty()