caesar-one commited on
Commit
1f93c1a
1 Parent(s): dfb91d7

Small improvements.

Browse files
Files changed (2) hide show
  1. README.md +17 -17
  2. main.py +8 -6
README.md CHANGED
@@ -15,23 +15,23 @@ license: apache-2.0
15
  Italian leaderboard
16
 
17
  ## Leaderboard
18
- | Model Name | Year | Publisher | Num. Parameters | Open? | Model Type | Average | Average (Zero-shot) | Average (N-shot) | ARC Challenge (zero-shot) | ARC Challenge (25-shot) | HellaSwag (zero-shot) | HellaSwag (10-shot) | MMLU (zero-shot) | MMLU (5-shot) | TruthfulQA (zero-shot MC2) |
19
- |--------------------------------------------------------------------------------------------|------|-------------------------------------------|-----------------|-------|---------------|---------|---------------------|------------------|---------------------------|-------------------------|-----------------------|---------------------|------------------|---------------|----------------------------|
20
- | [DanteLLM](https://huggingface.co/rstless-research/DanteLLM-7B-Instruct-Italian-v0.1-GGUF) | 2023 | RSTLess (Sapienza University of Rome) | 7B | yes | Italian FT | 47.52 | 47.34 | 47.69 | 41.89 | 47.01 | 47.99 | 47.79 | 47.05 | 48.27 | 52.41 |
21
- | [OpenDanteLLM](https://huggingface.co/rstless-research/) | 2023 | RSTLess (Sapienza University of Rome) | 7B | yes | Italian FT | 45.97 | 45.13 | 46.80 | 41.72 | 46.76 | 46.49 | 46.75 | 44.25 | 46.89 | 48.06 |
22
- | [Mistral v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) | 2023 | Mistral AI | 7B | yes | English | 44.29 | 45.15 | 43.43 | 37.46 | 41.47 | 43.48 | 42.99 | 44.66 | 45.84 | 54.99 |
23
- | [LLaMAntino](https://huggingface.co/swap-uniba/LLaMAntino-2-7b-hf-ITA) | 2024 | Bari University | 7B | yes | Italian FT | 41.66 | 40.86 | 42.46 | 38.22 | 41.72 | 46.30 | 46.91 | 33.89 | 38.74 | 45.03 |
24
- | [Fauno2](https://huggingface.co/andreabac3/Fauno2-LLaMa2-7B) | 2023 | RSTLess (Sapienza University of Rome) | 7B | yes | Italian FT | 41.74 | 42.90 | 40.57 | 36.26 | 39.33 | 44.25 | 44.07 | 40.30 | 38.32 | 50.77 |
25
- | [Fauno1](https://huggingface.co/andreabac3/Fauno2-LLaMa2-7B) | 2023 | RSTLess (Sapienza University of Rome) | 7B | yes | Italian FT | 36.91 | 37.20 | 36.61 | 33.10 | 36.52 | 43.13 | 42.86 | 28.79 | 30.45 | 43.78 |
26
- | [Camoscio](https://huggingface.co/teelinsan/camoscio-7b-llama) | 2023 | Gladia (Sapienza University of Rome) | 7B | yes | Italian FT | 37.22 | 38.01 | 36.42 | 33.28 | 36.60 | 42.91 | 43.29 | 30.53 | 29.38 | 45.33 |
27
- | [LLaMA2](https://huggingface.co/meta-llama/Llama-2-7b) | 2022 | Meta | 7B | yes | English | 39.50 | 39.14 | 39.86 | 33.28 | 37.71 | 44.31 | 43.97 | 34.12 | 37.91 | 44.83 |
28
- | [BloomZ](https://huggingface.co/bigscience/bloomz-7b1) | 2022 | BigScience | 7B | yes | Multilingual | 33.97 | 36.01 | 31.93 | 27.30 | 28.24 | 34.83 | 35.88 | 36.40 | 31.67 | 45.52 |
29
- | [iT5](https://huggingface.co/gsarti/it5-large) | 2022 | Groningen University | 738M | yes | Italian | 29.27 | 32.42 | 26.11 | 27.39 | 27.99 | 28.11 | 26.04 | 23.69 | 24.31 | 50.49 |
30
- | [GePpeTto](https://huggingface.co/LorenzoDeMattei/GePpeTto) | 2020 | Pisa/Groningen University, FBK, Aptus.AI | 117M | yes | Italian | 27.86 | 30.89 | 24.82 | 24.15 | 25.08 | 26.34 | 24.99 | 22.87 | 24.39 | 50.20 |
31
- | [mT5](https://huggingface.co/google/mt5-large) | 2020 | Google | 3.7B | yes | Multilingual | 29.00 | 30.99 | 27.01 | 25.94 | 27.56 | 26.96 | 27.86 | 25.56 | 25.60 | 45.50 |
32
- | [Minerva 3B](https://huggingface.co/sapienzanlp/Minerva-3B-base-v1.0) | 2024 | SapienzaNLP (Sapienza University of Rome) | 3B | yes | Multilingual | 33.94 | 34.37 | 33.52 | 30.29 | 30.89 | 42.38 | 43.16 | 24.62 | 26.50 | 40.18 |
33
- | [Minerva 1B](https://huggingface.co/sapienzanlp/Minerva-1B-base-v1.0) | 2024 | SapienzaNLP (Sapienza University of Rome) | 1B | yes | Multilingual | 29.78 | 31.46 | 28.09 | 24.32 | 25.25 | 34.01 | 34.07 | 24.69 | 24.94 | 42.84 |
34
- | [Minerva 350M](https://huggingface.co/sapienzanlp/Minerva-350M-base-v1.0) | 2024 | SapienzaNLP (Sapienza University of Rome) | 350M | yes | Multilingual | 28.35 | 30.72 | 26 | 23.21 | 24.32 | 29.33 | 29.37 | 23.10 | 24.29 | 47.23 |
35
 
36
  ## Benchmarks
37
 
 
15
  Italian leaderboard
16
 
17
  ## Leaderboard
18
+ | Model Name | Year | Publisher | Num. Params | Lang. | Avg. | Avg. (Zero-shot) | Avg. (N-shot) | MMLU (0-shot) | MMLU (5-shot) | ARC Challenge (0-shot) | ARC Challenge (25-shot) | HellaSwag (0-shot) | HellaSwag (10-shot) | TruthfulQA (0-shot) |
19
+ |--------------------------------------------------------------------------------------------|------|-------------------------------------------|-------------|--------------|-------|------------------|---------------|---------------|---------------|------------------------|-------------------------|--------------------|---------------------|-------------------------|
20
+ | [DanteLLM](https://huggingface.co/rstless-research/DanteLLM-7B-Instruct-Italian-v0.1-GGUF) | 2023 | RSTLess (Sapienza University of Rome) | 7B | Italian FT | 47.52 | 47.34 | 47.69 | 47.05 | 48.27 | 41.89 | 47.01 | 47.99 | 47.79 | 52.41 |
21
+ | [OpenDanteLLM](https://huggingface.co/rstless-research/) | 2023 | RSTLess (Sapienza University of Rome) | 7B | Italian FT | 45.97 | 45.13 | 46.80 | 44.25 | 46.89 | 41.72 | 46.76 | 46.49 | 46.75 | 48.06 |
22
+ | [Mistral v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) | 2023 | Mistral AI | 7B | English | 44.29 | 45.15 | 43.43 | 44.66 | 45.84 | 37.46 | 41.47 | 43.48 | 42.99 | 54.99 |
23
+ | [LLaMAntino](https://huggingface.co/swap-uniba/LLaMAntino-2-7b-hf-ITA) | 2024 | Bari University | 7B | Italian FT | 41.66 | 40.86 | 42.46 | 33.89 | 38.74 | 38.22 | 41.72 | 46.30 | 46.91 | 45.03 |
24
+ | [Fauno2](https://huggingface.co/andreabac3/Fauno2-LLaMa2-7B) | 2023 | RSTLess (Sapienza University of Rome) | 7B | Italian FT | 41.74 | 42.90 | 40.57 | 40.30 | 38.32 | 36.26 | 39.33 | 44.25 | 44.07 | 50.77 |
25
+ | [Fauno1](https://huggingface.co/andreabac3/Fauno2-LLaMa2-7B) | 2023 | RSTLess (Sapienza University of Rome) | 7B | Italian FT | 36.91 | 37.20 | 36.61 | 28.79 | 30.45 | 33.10 | 36.52 | 43.13 | 42.86 | 43.78 |
26
+ | [Camoscio](https://huggingface.co/teelinsan/camoscio-7b-llama) | 2023 | Gladia (Sapienza University of Rome) | 7B | Italian FT | 37.22 | 38.01 | 36.42 | 30.53 | 29.38 | 33.28 | 36.60 | 42.91 | 43.29 | 45.33 |
27
+ | [LLaMA2](https://huggingface.co/meta-llama/Llama-2-7b) | 2022 | Meta | 7B | English | 39.50 | 39.14 | 39.86 | 34.12 | 37.91 | 33.28 | 37.71 | 44.31 | 43.97 | 44.83 |
28
+ | [BloomZ](https://huggingface.co/bigscience/bloomz-7b1) | 2022 | BigScience | 7B | Multilingual | 33.97 | 36.01 | 31.93 | 36.40 | 31.67 | 27.30 | 28.24 | 34.83 | 35.88 | 45.52 |
29
+ | [iT5](https://huggingface.co/gsarti/it5-large) | 2022 | Groningen University | 738M | Italian | 29.27 | 32.42 | 26.11 | 23.69 | 24.31 | 27.39 | 27.99 | 28.11 | 26.04 | 50.49 |
30
+ | [GePpeTto](https://huggingface.co/LorenzoDeMattei/GePpeTto) | 2020 | Pisa/Groningen University, FBK, Aptus.AI | 117M | Italian | 27.86 | 30.89 | 24.82 | 22.87 | 24.39 | 24.15 | 25.08 | 26.34 | 24.99 | 50.20 |
31
+ | [mT5](https://huggingface.co/google/mt5-large) | 2020 | Google | 3.7B | Multilingual | 29.00 | 30.99 | 27.01 | 25.56 | 25.60 | 25.94 | 27.56 | 26.96 | 27.86 | 45.50 |
32
+ | [Minerva 3B](https://huggingface.co/sapienzanlp/Minerva-3B-base-v1.0) | 2024 | SapienzaNLP (Sapienza University of Rome) | 3B | Multilingual | 33.94 | 34.37 | 33.52 | 24.62 | 26.50 | 30.29 | 30.89 | 42.38 | 43.16 | 40.18 |
33
+ | [Minerva 1B](https://huggingface.co/sapienzanlp/Minerva-1B-base-v1.0) | 2024 | SapienzaNLP (Sapienza University of Rome) | 1B | Multilingual | 29.78 | 31.46 | 28.09 | 24.69 | 24.94 | 24.32 | 25.25 | 34.01 | 34.07 | 42.84 |
34
+ | [Minerva 350M](https://huggingface.co/sapienzanlp/Minerva-350M-base-v1.0) | 2024 | SapienzaNLP (Sapienza University of Rome) | 350M | Multilingual | 28.35 | 30.72 | 26 | 23.10 | 24.29 | 23.21 | 24.32 | 29.33 | 29.37 | 47.23 |
35
 
36
  ## Benchmarks
37
 
main.py CHANGED
@@ -6,7 +6,7 @@ import streamlit as st
6
  from pandas.api.types import is_bool_dtype, is_datetime64_any_dtype, is_numeric_dtype
7
 
8
  GITHUB_URL = "https://github.com/RSTLess-research/"
9
- NON_BENCHMARK_COLS = ["Open?", "Publisher"]
10
 
11
 
12
  def extract_table_and_format_from_markdown_text(markdown_table: str) -> pd.DataFrame:
@@ -247,7 +247,6 @@ def setup_leaderboard(readme: str):
247
  leaderboard_table = extract_markdown_table_from_multiline(readme, table_headline="## Leaderboard")
248
  leaderboard_table = remove_markdown_links(leaderboard_table)
249
  df_leaderboard = extract_table_and_format_from_markdown_text(leaderboard_table)
250
- df_leaderboard["Open?"] = df_leaderboard["Open?"].map({"yes": 1, "no": 0}).astype(bool)
251
 
252
  st.markdown("## Leaderboard")
253
  modify = st.checkbox("Add filters")
@@ -257,11 +256,12 @@ def setup_leaderboard(readme: str):
257
  df_leaderboard = filter_dataframe_by_column_values(df_leaderboard)
258
  df_leaderboard = filter_dataframe_by_model_type(df_leaderboard)
259
 
260
- df_leaderboard = df_leaderboard.sort_values(by=['Average'], ascending=False)
261
- df_leaderboard["Rank"] = df_leaderboard["Average"].rank(ascending=False)
262
  # move rank at 0-th column
263
  # Ensure 'Rank' is the first column
264
  cols = ['Rank'] + [col for col in df_leaderboard.columns if col != 'Rank']
 
265
  df_leaderboard = df_leaderboard[cols]
266
 
267
  print(df_leaderboard.columns)
@@ -316,10 +316,12 @@ def setup_disclaimer():
316
  st.markdown("## Authors")
317
  st.markdown(
318
  """
319
- - [Andrea Bacciu](https://www.linkedin.com/in/andreabacciu/) (Work done prior joining Amazon)
320
- - [Cesare Campagnano](https://www.linkedin.com/in/caesar-one/)
321
  - [Giovanni Trappolini](https://www.linkedin.com/in/giovanni-trappolini/)
322
  - [Professor Fabrizio Silvestri](https://www.linkedin.com/in/fabrizio-silvestri-a6b0391/)
 
 
323
  """
324
  )
325
 
 
6
  from pandas.api.types import is_bool_dtype, is_datetime64_any_dtype, is_numeric_dtype
7
 
8
  GITHUB_URL = "https://github.com/RSTLess-research/"
9
+ NON_BENCHMARK_COLS = ["Publisher"]
10
 
11
 
12
  def extract_table_and_format_from_markdown_text(markdown_table: str) -> pd.DataFrame:
 
247
  leaderboard_table = extract_markdown_table_from_multiline(readme, table_headline="## Leaderboard")
248
  leaderboard_table = remove_markdown_links(leaderboard_table)
249
  df_leaderboard = extract_table_and_format_from_markdown_text(leaderboard_table)
 
250
 
251
  st.markdown("## Leaderboard")
252
  modify = st.checkbox("Add filters")
 
256
  df_leaderboard = filter_dataframe_by_column_values(df_leaderboard)
257
  df_leaderboard = filter_dataframe_by_model_type(df_leaderboard)
258
 
259
+ df_leaderboard = df_leaderboard.sort_values(by=['Avg.'], ascending=False)
260
+ df_leaderboard["Rank"] = df_leaderboard["Avg."].rank(ascending=False)
261
  # move rank at 0-th column
262
  # Ensure 'Rank' is the first column
263
  cols = ['Rank'] + [col for col in df_leaderboard.columns if col != 'Rank']
264
+
265
  df_leaderboard = df_leaderboard[cols]
266
 
267
  print(df_leaderboard.columns)
 
316
  st.markdown("## Authors")
317
  st.markdown(
318
  """
319
+ - [Andrea Bacciu](https://www.linkedin.com/in/andreabacciu/)* (Work done prior joining Amazon)
320
+ - [Cesare Campagnano](https://www.linkedin.com/in/caesar-one/)*
321
  - [Giovanni Trappolini](https://www.linkedin.com/in/giovanni-trappolini/)
322
  - [Professor Fabrizio Silvestri](https://www.linkedin.com/in/fabrizio-silvestri-a6b0391/)
323
+
324
+ \*Equal contribution
325
  """
326
  )
327