CoreyMorris commited on
Commit
e05c716
1 Parent(s): 36799a9

updated with new data

Browse files
Files changed (2) hide show
  1. app.py +1 -0
  2. processed_data_2024-04-16.csv +0 -0
app.py CHANGED
@@ -115,6 +115,7 @@ st.title('Interactive Portal for Analyzing Open Source Large Language Models')
115
  st.markdown("""***Last updated March 17th 2024***""")
116
  st.markdown("""**It has not been updated to correctly extract the parameter number from mixture of experts models.**""")
117
  st.markdown("""**As of 04-17-2024, this data was not generated using the chat templates. Smaller models are especially sensative to this and other aspects related to the format of the inputs.**""")
 
118
  st.markdown("""
119
  This page provides a way to explore the results for individual tasks and compare models across tasks. Data for the benchmarks hellaswag, arc_challenge, and truthfulQA have also been included for comparison.
120
  There are 57 tasks in the MMLU evaluation that cover a wide variety of subjects including Science, Math, Humanities, Social Science, Applied Science, Logic, and Security.
 
115
  st.markdown("""***Last updated March 17th 2024***""")
116
  st.markdown("""**It has not been updated to correctly extract the parameter number from mixture of experts models.**""")
117
  st.markdown("""**As of 04-17-2024, this data was not generated using the chat templates. Smaller models are especially sensative to this and other aspects related to the format of the inputs.**""")
118
+ st.markdown("""For a good sense of general relative performance of models, I would highly reccomend this leaderboard https://chat.lmsys.org/""")
119
  st.markdown("""
120
  This page provides a way to explore the results for individual tasks and compare models across tasks. Data for the benchmarks hellaswag, arc_challenge, and truthfulQA have also been included for comparison.
121
  There are 57 tasks in the MMLU evaluation that cover a wide variety of subjects including Science, Math, Humanities, Social Science, Applied Science, Logic, and Security.
processed_data_2024-04-16.csv ADDED
The diff for this file is too large to render. See raw diff