Spaces:

bigcode
/

bigcodebench-leaderboard

Running

Terry Zhuo commited on Aug 6, 2024

Commit

6e2a72a

1 Parent(s): 024b141

update

Files changed (2) hide show

src/display/about.py CHANGED Viewed

@@ -20,7 +20,7 @@ BigCodeBench is the first benchmark that meets all three expectations. It is an
 ### Benchamrks & Prompts
 The dataset has 2 variants:
-1. `BigCodeBench-Complete`: _Code Completion based on the structured docstrings_.
 1. `BigCodeBench-Instruct`: _Code Generation based on the NL-oriented instructions_.
 Figure below shows the example of `Complete` vs `Instruct` prompt. For `Instruct`, we only focus on instruction-tuned LLMs.

 ### Benchamrks & Prompts
 The dataset has 2 variants:
+1. `BigCodeBench-Complete`: _Code Completion based on the structured long-context docstrings_.
 1. `BigCodeBench-Instruct`: _Code Generation based on the NL-oriented instructions_.
 Figure below shows the example of `Complete` vs `Instruct` prompt. For `Instruct`, we only focus on instruction-tuned LLMs.

src/populate.py CHANGED Viewed

@@ -45,6 +45,6 @@ def get_leaderboard_df(leaderboard_dataset: Dataset, cols: list):
     df[AutoEvalColumn.average.name] = df.apply(lambda x: round((x[AutoEvalColumn.complete.name] + x[AutoEvalColumn.instruct.name]) / 2, 1) if not pd.isna(x[AutoEvalColumn.complete.name]) and not pd.isna(x[AutoEvalColumn.instruct.name]) else None, axis=1)
     df[AutoEvalColumn.size_range.name] = df[AutoEvalColumn.size.name].apply(lambda x: next((k for k, v in NUMERIC_INTERVALS.items() if x in v), "?"))
     df = make_clickable_model(df, AutoEvalColumn.model.name, AutoEvalColumn.link.name)
-    df = df.sort_values(by=[AutoEvalColumn.complete.name], ascending=False)
     df = df[cols].round(decimals=2)
     return df

     df[AutoEvalColumn.average.name] = df.apply(lambda x: round((x[AutoEvalColumn.complete.name] + x[AutoEvalColumn.instruct.name]) / 2, 1) if not pd.isna(x[AutoEvalColumn.complete.name]) and not pd.isna(x[AutoEvalColumn.instruct.name]) else None, axis=1)
     df[AutoEvalColumn.size_range.name] = df[AutoEvalColumn.size.name].apply(lambda x: next((k for k, v in NUMERIC_INTERVALS.items() if x in v), "?"))
     df = make_clickable_model(df, AutoEvalColumn.model.name, AutoEvalColumn.link.name)
+    df = df.sort_values(by=[AutoEvalColumn.average.name], ascending=False)
     df = df[cols].round(decimals=2)
     return df