Corey Morris commited on
Commit
3f507e0
1 Parent(s): 6ed8672

Added new hugging face results

Browse files
Files changed (2) hide show
  1. app.py +3 -3
  2. results +1 -1
app.py CHANGED
@@ -123,11 +123,11 @@ def find_top_differences_table(df, target_model, closest_models, num_differences
123
  data_provider = ResultDataProcessor()
124
 
125
  # st.title('Model Evaluation Results including MMLU by task')
126
- st.title('Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing 900+ Open Source Models Across 57 Diverse Evaluation Tasks')
127
- st.markdown("""***Last updated August 22th***""")
128
  st.markdown("""**Models that are suspected to have training data contaminated with evaluation data have been removed.**""")
129
  st.markdown("""
130
- Hugging Face has run evaluations on over 900 open source models and provides results on a
131
  [publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
132
  The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
133
  This app provides a way to explore the results for individual tasks and compare models across tasks.
 
123
  data_provider = ResultDataProcessor()
124
 
125
  # st.title('Model Evaluation Results including MMLU by task')
126
+ st.title('Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing 1000+ Open Source Models Across 57 Diverse Evaluation Tasks')
127
+ st.markdown("""***Last updated August 26th***""")
128
  st.markdown("""**Models that are suspected to have training data contaminated with evaluation data have been removed.**""")
129
  st.markdown("""
130
+ Hugging Face runs evaluations on open source models and provides results on a
131
  [publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
132
  The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
133
  This app provides a way to explore the results for individual tasks and compare models across tasks.
results CHANGED
@@ -1 +1 @@
1
- Subproject commit bcf5e74b8117e8aa50260dbc089f6cc812d96c5f
 
1
+ Subproject commit 4f0a4395819faaf7fb9215d26ddee21f5dcf3c95