Spaces:

ought
/

raft-leaderboard

Runtime error

test

#13

by Tae - opened Feb 5, 2023

←

Files changed (2) hide show

app.py CHANGED Viewed

@@ -21,7 +21,7 @@ FORMATTED_TASK_NAMES = sorted([" ".join(t.capitalize() for t in task.split("_"))
 def download_submissions():
     filt = DatasetFilter(benchmark="raft")
-    all_submissions = list_datasets(filter=filt, full=True, use_auth_token=auth_token)
     submissions = []
     for dataset in all_submissions:
@@ -83,8 +83,6 @@ st.set_page_config(layout="wide")
 st.title("RAFT: Real-world Annotated Few-shot Tasks")
 st.markdown(
     """
-⚠️ **The RAFT benchmark is currently undergoing maintenance and is not accepting submissions at the moment. We apologise for the inconvenience.**
 Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants?
 [RAFT](https://raft.elicit.org) is a few-shot classification benchmark that tests language models:
@@ -99,9 +97,7 @@ To submit to RAFT, follow the instruction posted on [this page](https://huggingf
 submissions = download_submissions()
 print(f"INFO - downloaded {len(submissions)} submissions")
 df = format_submissions(submissions)
-styler = pd.io.formats.style.Styler(df, precision=3).set_properties(
-    **{"white-space": "pre-wrap", "text-align": "center"}
-)
 # hack to remove index column: https://discuss.streamlit.io/t/questions-on-st-table/6878/3
 st.markdown(
     """

 def download_submissions():
     filt = DatasetFilter(benchmark="raft")
+    all_submissions = list_datasets(filter=filt, cardData=True, use_auth_token=auth_token)
     submissions = []
     for dataset in all_submissions:
 st.title("RAFT: Real-world Annotated Few-shot Tasks")
 st.markdown(
     """
 Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants?
 [RAFT](https://raft.elicit.org) is a few-shot classification benchmark that tests language models:
 submissions = download_submissions()
 print(f"INFO - downloaded {len(submissions)} submissions")
 df = format_submissions(submissions)
+styler = df.style.set_precision(3).set_properties(**{"white-space": "pre-wrap", "text-align": "center"})
 # hack to remove index column: https://discuss.streamlit.io/t/questions-on-st-table/6878/3
 st.markdown(
     """

requirements.txt CHANGED Viewed

@@ -1,6 +1,5 @@
-pandas==2.0.3
 python-dotenv
 protobuf~=3.19.0
-huggingface-hub==0.18.0
-datasets==2.8.0
-altair<5

+pandas<=1.4
 python-dotenv
 protobuf~=3.19.0
+huggingface-hub==0.9.1
+datasets==2.8.0