Corey Morris commited on
Commit
18ec1ba
1 Parent(s): fb25b1e

Modified title and explanation to better reflect what the site is

Browse files
Files changed (1) hide show
  1. app.py +7 -5
app.py CHANGED
@@ -104,12 +104,14 @@ def find_top_differences_table(df, target_model, closest_models, num_differences
104
  data_provider = ResultDataProcessor()
105
 
106
  # st.title('Model Evaluation Results including MMLU by task')
107
- st.title('MMLU-by-Task Evaluation Results for 700+ Open Source Models')
108
- st.markdown("""***Last updated August 10th***""")
109
  st.markdown("""
110
- Hugging Face has run evaluations on over 500 open source models and provides results on a
111
  [publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
112
- The leaderboard currently displays the overall result for MMLU. This page shows individual accuracy scores for all 57 tasks of the MMLU evaluation.
 
 
113
  [Preliminary analysis of MMLU-by-Task data](https://coreymorrisdata.medium.com/preliminary-analysis-of-mmlu-evaluation-data-insights-from-500-open-source-models-e67885aa364b)
114
  """)
115
 
@@ -341,7 +343,7 @@ st.markdown("***Thank you to hugging face for running the evaluations and supply
341
  st.markdown("""
342
  # Citation
343
 
344
- 1. Corey Morris (2023). *MMLU-by-Task Evaluation Results for 700+ Open Source Models*. [link](https://huggingface.co/spaces/CoreyMorris/MMLU-by-task-Leaderboard)
345
 
346
  2. Edward Beeching, Clémentine Fourrier, Nathan Habib, Sheon Han, Nathan Lambert, Nazneen Rajani, Omar Sanseviero, Lewis Tunstall, Thomas Wolf. (2023). *Open LLM Leaderboard*. Hugging Face. [link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
347
 
 
104
  data_provider = ResultDataProcessor()
105
 
106
  # st.title('Model Evaluation Results including MMLU by task')
107
+ st.title('Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing 700+ Open Source Models Across 57 Diverse Evaluation Tasks')
108
+ st.markdown("""***Last updated August 15th***""")
109
  st.markdown("""
110
+ Hugging Face has run evaluations on over 700 open source models and provides results on a
111
  [publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
112
+ The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
113
+ This app provides a way to explore the results for individual tasks and compare models across tasks.
114
+ There are 57 tasks in the MMLU evaluation that cover a wide variety of subjects including Science, Math, Humanities, Social Science, Applied Science, Logic, and Security.
115
  [Preliminary analysis of MMLU-by-Task data](https://coreymorrisdata.medium.com/preliminary-analysis-of-mmlu-evaluation-data-insights-from-500-open-source-models-e67885aa364b)
116
  """)
117
 
 
343
  st.markdown("""
344
  # Citation
345
 
346
+ 1. Corey Morris (2023). *Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing 700+ Open Source Models Across 57 Diverse Evaluation Tasks*. [link](https://huggingface.co/spaces/CoreyMorris/MMLU-by-task-Leaderboard)
347
 
348
  2. Edward Beeching, Clémentine Fourrier, Nathan Habib, Sheon Han, Nathan Lambert, Nazneen Rajani, Omar Sanseviero, Lewis Tunstall, Thomas Wolf. (2023). *Open LLM Leaderboard*. Hugging Face. [link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
349