Spaces:

rstless-research
/

italian_open_llm_leaderboard

Running

App Files Files Community

caesar-one commited on May 10

Commit

1f93c1a

•

1 Parent(s): dfb91d7

Small improvements.

Browse files

Files changed (2) hide show

README.md +17 -17
main.py +8 -6

README.md CHANGED Viewed

@@ -15,23 +15,23 @@ license: apache-2.0
 Italian leaderboard
 ## Leaderboard
-| Model Name                                                                                 | Year | Publisher                                 | Num. Parameters | Open? | Model Type    | Average | Average (Zero-shot) | Average (N-shot) | ARC Challenge (zero-shot) | ARC Challenge (25-shot) | HellaSwag (zero-shot) | HellaSwag (10-shot) | MMLU (zero-shot) | MMLU (5-shot) | TruthfulQA (zero-shot MC2) |
-|--------------------------------------------------------------------------------------------|------|-------------------------------------------|-----------------|-------|---------------|---------|---------------------|------------------|---------------------------|-------------------------|-----------------------|---------------------|------------------|---------------|----------------------------|
-| [DanteLLM](https://huggingface.co/rstless-research/DanteLLM-7B-Instruct-Italian-v0.1-GGUF) | 2023 | RSTLess (Sapienza University of Rome)     | 7B              | yes   | Italian FT    | 47.52   | 47.34               | 47.69            | 41.89                     | 47.01                   | 47.99                 | 47.79               | 47.05            | 48.27         | 52.41                      |
-| [OpenDanteLLM](https://huggingface.co/rstless-research/)                                   | 2023 | RSTLess (Sapienza University of Rome)     | 7B              | yes   | Italian  FT   | 45.97   | 45.13               | 46.80            | 41.72                     | 46.76                   | 46.49                 | 46.75               | 44.25            | 46.89         | 48.06                      |
-| [Mistral v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)                  | 2023 | Mistral  AI                               | 7B              | yes   | English       | 44.29   | 45.15               | 43.43            | 37.46                     | 41.47                   | 43.48                 | 42.99               | 44.66            | 45.84         | 54.99                      |
-| [LLaMAntino](https://huggingface.co/swap-uniba/LLaMAntino-2-7b-hf-ITA)                     | 2024 | Bari University                           | 7B              | yes   | Italian  FT   | 41.66   | 40.86               | 42.46            | 38.22                     | 41.72                   | 46.30                 | 46.91               | 33.89            | 38.74         | 45.03                      |
-| [Fauno2](https://huggingface.co/andreabac3/Fauno2-LLaMa2-7B)                               | 2023 | RSTLess (Sapienza University of Rome)     | 7B              | yes   | Italian  FT   | 41.74   | 42.90               | 40.57            | 36.26                     | 39.33                   | 44.25                 | 44.07               | 40.30            | 38.32         | 50.77                      |
-| [Fauno1](https://huggingface.co/andreabac3/Fauno2-LLaMa2-7B)                               | 2023 | RSTLess (Sapienza University of Rome)     | 7B              | yes   | Italian  FT   | 36.91   | 37.20               | 36.61            | 33.10                     | 36.52                   | 43.13                 | 42.86               | 28.79            | 30.45         | 43.78                      |
-| [Camoscio](https://huggingface.co/teelinsan/camoscio-7b-llama)                             | 2023 | Gladia  (Sapienza University of Rome)     | 7B              | yes   | Italian  FT   | 37.22   | 38.01               | 36.42            | 33.28                     | 36.60                   | 42.91                 | 43.29               | 30.53            | 29.38         | 45.33                      |
-| [LLaMA2](https://huggingface.co/meta-llama/Llama-2-7b)                                     | 2022 | Meta                                      | 7B              | yes   | English       | 39.50   | 39.14               | 39.86            | 33.28                     | 37.71                   | 44.31                 | 43.97               | 34.12            | 37.91         | 44.83                      |
-| [BloomZ](https://huggingface.co/bigscience/bloomz-7b1)                                     | 2022 | BigScience                                | 7B              | yes   | Multilingual  | 33.97   | 36.01               | 31.93            | 27.30                     | 28.24                   | 34.83                 | 35.88               | 36.40            | 31.67         | 45.52                      |
-| [iT5](https://huggingface.co/gsarti/it5-large)                                             | 2022 | Groningen University                      | 738M            | yes   | Italian       | 29.27   | 32.42               | 26.11            | 27.39                     | 27.99                   | 28.11                 | 26.04               | 23.69            | 24.31         | 50.49                      |
-| [GePpeTto](https://huggingface.co/LorenzoDeMattei/GePpeTto)                                | 2020 | Pisa/Groningen University, FBK, Aptus.AI  | 117M            | yes   | Italian       | 27.86   | 30.89               | 24.82            | 24.15                     | 25.08                   | 26.34                 | 24.99               | 22.87            | 24.39         | 50.20                      |
-| [mT5](https://huggingface.co/google/mt5-large)                                             | 2020 | Google                                    | 3.7B            | yes   | Multilingual  | 29.00   | 30.99               | 27.01            | 25.94                     | 27.56                   | 26.96                 | 27.86               | 25.56            | 25.60         | 45.50                      |
-| [Minerva 3B](https://huggingface.co/sapienzanlp/Minerva-3B-base-v1.0)                      | 2024 | SapienzaNLP (Sapienza University of Rome) | 3B              | yes   | Multilingual  | 33.94   | 34.37               | 33.52            | 30.29                     | 30.89                   | 42.38                 | 43.16               | 24.62            | 26.50         | 40.18                      |
-| [Minerva 1B](https://huggingface.co/sapienzanlp/Minerva-1B-base-v1.0)                      | 2024 | SapienzaNLP (Sapienza University of Rome) | 1B              | yes   | Multilingual  | 29.78   | 31.46               | 28.09            | 24.32                     | 25.25                   | 34.01                 | 34.07               | 24.69            | 24.94         | 42.84                      |
-| [Minerva 350M](https://huggingface.co/sapienzanlp/Minerva-350M-base-v1.0)                  | 2024 | SapienzaNLP (Sapienza University of Rome) | 350M            | yes   | Multilingual  | 28.35     | 30.72               | 26               | 23.21                     | 24.32                   | 29.33                 | 29.37               | 23.10            | 24.29         | 47.23                      |
 ## Benchmarks

 Italian leaderboard
 ## Leaderboard
+| Model Name                                                                                 | Year | Publisher                                 | Num. Params | Lang.        | Avg.  | Avg. (Zero-shot) | Avg. (N-shot) | MMLU (0-shot) | MMLU (5-shot) | ARC Challenge (0-shot) | ARC Challenge (25-shot) | HellaSwag (0-shot) | HellaSwag (10-shot) | TruthfulQA (0-shot) |
+|--------------------------------------------------------------------------------------------|------|-------------------------------------------|-------------|--------------|-------|------------------|---------------|---------------|---------------|------------------------|-------------------------|--------------------|---------------------|-------------------------|
+| [DanteLLM](https://huggingface.co/rstless-research/DanteLLM-7B-Instruct-Italian-v0.1-GGUF) | 2023 | RSTLess (Sapienza University of Rome)     | 7B          | Italian FT   | 47.52 | 47.34            | 47.69         | 47.05         | 48.27         | 41.89                  | 47.01                   | 47.99              | 47.79               | 52.41                   |
+| [OpenDanteLLM](https://huggingface.co/rstless-research/)                                   | 2023 | RSTLess (Sapienza University of Rome)     | 7B          | Italian  FT  | 45.97 | 45.13            | 46.80         | 44.25         | 46.89         | 41.72                  | 46.76                   | 46.49              | 46.75               | 48.06                   |
+| [Mistral v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)                  | 2023 | Mistral  AI                               | 7B          | English      | 44.29 | 45.15            | 43.43         | 44.66         | 45.84         | 37.46                  | 41.47                   | 43.48              | 42.99               | 54.99                   |
+| [LLaMAntino](https://huggingface.co/swap-uniba/LLaMAntino-2-7b-hf-ITA)                     | 2024 | Bari University                           | 7B          | Italian  FT  | 41.66 | 40.86            | 42.46         | 33.89         | 38.74         | 38.22                  | 41.72                   | 46.30              | 46.91               | 45.03                   |
+| [Fauno2](https://huggingface.co/andreabac3/Fauno2-LLaMa2-7B)                               | 2023 | RSTLess (Sapienza University of Rome)     | 7B          | Italian  FT  | 41.74 | 42.90            | 40.57         | 40.30         | 38.32         | 36.26                  | 39.33                   | 44.25              | 44.07               | 50.77                   |
+| [Fauno1](https://huggingface.co/andreabac3/Fauno2-LLaMa2-7B)                               | 2023 | RSTLess (Sapienza University of Rome)     | 7B          | Italian  FT  | 36.91 | 37.20            | 36.61         | 28.79         | 30.45         | 33.10                  | 36.52                   | 43.13              | 42.86               | 43.78                   |
+| [Camoscio](https://huggingface.co/teelinsan/camoscio-7b-llama)                             | 2023 | Gladia  (Sapienza University of Rome)     | 7B          | Italian  FT  | 37.22 | 38.01            | 36.42         | 30.53         | 29.38         | 33.28                  | 36.60                   | 42.91              | 43.29               | 45.33                   |
+| [LLaMA2](https://huggingface.co/meta-llama/Llama-2-7b)                                     | 2022 | Meta                                      | 7B          | English      | 39.50 | 39.14            | 39.86         | 34.12         | 37.91         | 33.28                  | 37.71                   | 44.31              | 43.97               | 44.83                   |
+| [BloomZ](https://huggingface.co/bigscience/bloomz-7b1)                                     | 2022 | BigScience                                | 7B          | Multilingual | 33.97 | 36.01            | 31.93         | 36.40         | 31.67         | 27.30                  | 28.24                   | 34.83              | 35.88               | 45.52                   |
+| [iT5](https://huggingface.co/gsarti/it5-large)                                             | 2022 | Groningen University                      | 738M        | Italian      | 29.27 | 32.42            | 26.11         | 23.69         | 24.31         | 27.39                  | 27.99                   | 28.11              | 26.04               | 50.49                   |
+| [GePpeTto](https://huggingface.co/LorenzoDeMattei/GePpeTto)                                | 2020 | Pisa/Groningen University, FBK, Aptus.AI  | 117M        | Italian      | 27.86 | 30.89            | 24.82         | 22.87         | 24.39         | 24.15                  | 25.08                   | 26.34              | 24.99               | 50.20                   |
+| [mT5](https://huggingface.co/google/mt5-large)                                             | 2020 | Google                                    | 3.7B        | Multilingual | 29.00 | 30.99            | 27.01         | 25.56         | 25.60         | 25.94                  | 27.56                   | 26.96              | 27.86               | 45.50                   |
+| [Minerva 3B](https://huggingface.co/sapienzanlp/Minerva-3B-base-v1.0)                      | 2024 | SapienzaNLP (Sapienza University of Rome) | 3B          | Multilingual | 33.94 | 34.37            | 33.52         | 24.62         | 26.50         | 30.29                  | 30.89                   | 42.38              | 43.16               | 40.18                   |
+| [Minerva 1B](https://huggingface.co/sapienzanlp/Minerva-1B-base-v1.0)                      | 2024 | SapienzaNLP (Sapienza University of Rome) | 1B          | Multilingual | 29.78 | 31.46            | 28.09         | 24.69         | 24.94         | 24.32                  | 25.25                   | 34.01              | 34.07               | 42.84                   |
+| [Minerva 350M](https://huggingface.co/sapienzanlp/Minerva-350M-base-v1.0)                  | 2024 | SapienzaNLP (Sapienza University of Rome) | 350M        | Multilingual | 28.35 | 30.72            | 26            | 23.10         | 24.29         | 23.21                  | 24.32                   | 29.33              | 29.37               | 47.23                   |
 ## Benchmarks

main.py CHANGED Viewed

@@ -6,7 +6,7 @@ import streamlit as st
 from pandas.api.types import is_bool_dtype, is_datetime64_any_dtype, is_numeric_dtype
 GITHUB_URL = "https://github.com/RSTLess-research/"
-NON_BENCHMARK_COLS = ["Open?", "Publisher"]
 def extract_table_and_format_from_markdown_text(markdown_table: str) -> pd.DataFrame:
@@ -247,7 +247,6 @@ def setup_leaderboard(readme: str):
     leaderboard_table = extract_markdown_table_from_multiline(readme, table_headline="## Leaderboard")
     leaderboard_table = remove_markdown_links(leaderboard_table)
     df_leaderboard = extract_table_and_format_from_markdown_text(leaderboard_table)
-    df_leaderboard["Open?"] = df_leaderboard["Open?"].map({"yes": 1, "no": 0}).astype(bool)
     st.markdown("## Leaderboard")
     modify = st.checkbox("Add filters")
@@ -257,11 +256,12 @@ def setup_leaderboard(readme: str):
         df_leaderboard = filter_dataframe_by_column_values(df_leaderboard)
         df_leaderboard = filter_dataframe_by_model_type(df_leaderboard)
-    df_leaderboard = df_leaderboard.sort_values(by=['Average'], ascending=False)
-    df_leaderboard["Rank"] = df_leaderboard["Average"].rank(ascending=False)
     # move rank at 0-th column
     # Ensure 'Rank' is the first column
     cols = ['Rank'] + [col for col in df_leaderboard.columns if col != 'Rank']
     df_leaderboard = df_leaderboard[cols]
     print(df_leaderboard.columns)
@@ -316,10 +316,12 @@ def setup_disclaimer():
     st.markdown("## Authors")
     st.markdown(
 """
-- [Andrea Bacciu](https://www.linkedin.com/in/andreabacciu/) (Work done prior joining Amazon)
-- [Cesare Campagnano](https://www.linkedin.com/in/caesar-one/)
 - [Giovanni Trappolini](https://www.linkedin.com/in/giovanni-trappolini/)
 - [Professor Fabrizio Silvestri](https://www.linkedin.com/in/fabrizio-silvestri-a6b0391/)
 """
     )

 from pandas.api.types import is_bool_dtype, is_datetime64_any_dtype, is_numeric_dtype
 GITHUB_URL = "https://github.com/RSTLess-research/"
+NON_BENCHMARK_COLS = ["Publisher"]
 def extract_table_and_format_from_markdown_text(markdown_table: str) -> pd.DataFrame:
     leaderboard_table = extract_markdown_table_from_multiline(readme, table_headline="## Leaderboard")
     leaderboard_table = remove_markdown_links(leaderboard_table)
     df_leaderboard = extract_table_and_format_from_markdown_text(leaderboard_table)
     st.markdown("## Leaderboard")
     modify = st.checkbox("Add filters")
         df_leaderboard = filter_dataframe_by_column_values(df_leaderboard)
         df_leaderboard = filter_dataframe_by_model_type(df_leaderboard)
+    df_leaderboard = df_leaderboard.sort_values(by=['Avg.'], ascending=False)
+    df_leaderboard["Rank"] = df_leaderboard["Avg."].rank(ascending=False)
     # move rank at 0-th column
     # Ensure 'Rank' is the first column
     cols = ['Rank'] + [col for col in df_leaderboard.columns if col != 'Rank']
     df_leaderboard = df_leaderboard[cols]
     print(df_leaderboard.columns)
     st.markdown("## Authors")
     st.markdown(
 """
+- [Andrea Bacciu](https://www.linkedin.com/in/andreabacciu/)* (Work done prior joining Amazon)
+- [Cesare Campagnano](https://www.linkedin.com/in/caesar-one/)*
 - [Giovanni Trappolini](https://www.linkedin.com/in/giovanni-trappolini/)
 - [Professor Fabrizio Silvestri](https://www.linkedin.com/in/fabrizio-silvestri-a6b0391/)
+\*Equal contribution
 """
     )