Spaces:

THUIR
/

AEOLLM

Running

陈俊杰 commited on Sep 2, 2024

Commit

932ed5c

1 Parent(s): 5447046

time

Files changed (1) hide show

app.py CHANGED Viewed

@@ -268,7 +268,7 @@ This leaderboard is used to show the performance of the <strong>automatic evalua
         beijing_time = datetime.now(beijing_tz)
         # 在页面上动态显示当前北京时间
-        time_placeholder.write("当前北京时间: " + beijing_time.strftime('%Y-%m-%d %H:%M:%S'))
         # 设置更新频率为每秒钟一次
         time.sleep(1)
@@ -286,5 +286,10 @@ Please feel free to contact us! 😉
 </p>""",unsafe_allow_html=True)
 elif page == "References":
     st.header("References")
-    st.markdown("""TAB""")

         beijing_time = datetime.now(beijing_tz)
         # 在页面上动态显示当前北京时间
+        time_placeholder.write("Current Beijing Time: " + beijing_time.strftime('%Y-%m-%d %H:%M:%S'))
         # 设置更新频率为每秒钟一次
         time.sleep(1)
 </p>""",unsafe_allow_html=True)
 elif page == "References":
     st.header("References")
+    st.markdown("""<p>[1] Mao R, Chen G, Zhang X, et al. GPTEval: A survey on assessments of ChatGPT and GPT-4. <a href="https://arxiv.org/pdf/2308.12488">pdf</a><br />
+[2] Chang Y, Wang X, Wang J, et al. A survey on evaluation of large language models. <a href="https://dl.acm.org/doi/pdf/10.1145/3641289">pdf</a><br />
+[3] Chan C M, Chen W, Su Y, et al. Chateval: Towards better llm-based evaluators through multi-agent debate. <a href="https://arxiv.org/pdf/2308.07201">pdf</a><br />
+[4] Li R, Patel T, Du X. Prd: Peer rank and discussion improve large language model based evaluations. <a href="https://arxiv.org/pdf/2307.02762">pdf</a><br />
+[5] Chu Z, Ai Q, Tu Y, et al. Pre: A peer review based large language model evaluator. <a href="https://arxiv.org/pdf/2401.15641">pdf</a></p>
+""")