arshy commited on
Commit
33a5c4f
1 Parent(s): 85e2c3f

initial commit

Browse files
Files changed (3) hide show
  1. app.py +57 -0
  2. formatted_data.csv +8 -0
  3. requirements.txt +2 -0
app.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+
4
+ # Path to the CSV file
5
+ csv_file_path = "formatted_data.csv"
6
+
7
+ # Reading the CSV file
8
+ df = pd.read_csv(csv_file_path)
9
+
10
+ # Displaying the DataFrame in the Streamlit app with enhanced interactivity
11
+ st.title('Olas Predict Benchmark')
12
+ st.markdown('## Leaderboard showing the performance of Olas Predict tools on the Autocast dataset.')
13
+ st.markdown("<style>.big-font {font-size:20px !important;}</style>", unsafe_allow_html=True)
14
+ st.markdown('Use the table below to interact with the data and explore the performance of different tools.', unsafe_allow_html=True)
15
+ st.dataframe(df.style.format(precision=2))
16
+
17
+ st.markdown("""
18
+ ## Benchmark Overview
19
+ - The benchmark evaluates the performance of Olas Predict tools on the Autocast dataset.
20
+ - The dataset has been refined to enhance the evaluation of the tools.
21
+ - The leaderboard shows the performance of the tools based on the refined dataset.
22
+ - The script to run the benchmark is available in the repo [here](https://github.com/valory-xyz/olas-predict-benchmark).
23
+
24
+ ## How to run your tools on the benchmark
25
+ - Fork the repo [here](https://github.com/valory-xyz/olas-predict-benchmark).
26
+ - Git init the submodules and update the submodule to get the latest dataset `mech` tool.
27
+ - `git submodule init`
28
+ - `git submodule update --remote --recursive`
29
+ - Include your tool in the `mech/packages` directory accordingly.
30
+ - Guidelines on how to include your tool can be found [here](xxx).
31
+ - Run the benchmark script.
32
+
33
+ ## Dataset Overview
34
+ This project leverages the Autocast dataset from the research paper titled ["Forecasting Future World Events with Neural Networks"](https://arxiv.org/abs/2206.15474).
35
+ The dataset has undergone further refinement to enhance the performance evaluation of Olas mech prediction tools.
36
+ Both the original and refined datasets are hosted on HuggingFace.
37
+
38
+ ### Refined Dataset Files
39
+ - You can find the refined dataset on HuggingFace [here](https://huggingface.co/datasets/valory/autocast).
40
+ - `autocast_questions_filtered.json`: A JSON subset of the initial autocast dataset.
41
+ - `autocast_questions_filtered.pkl`: A pickle file mapping URLs to their respective scraped documents within the filtered dataset.
42
+ - `retrieved_docs.pkl`: Contains all the scraped texts.
43
+
44
+ ### Filtering Criteria
45
+ To refine the dataset, we applied the following criteria to ensure the reliability of the URLs:
46
+ - URLs not returning HTTP 200 status codes are excluded.
47
+ - Difficult-to-scrape sites, such as Twitter and Bloomberg, are omitted.
48
+ - Links with less than 1000 words are removed.
49
+ - Only samples with a minimum of 5 and a maximum of 20 working URLs are retained.
50
+
51
+ ### Scraping Approach
52
+ The content of the filtered URLs has been scraped using various libraries, depending on the source:
53
+ - `pypdf2` for PDF URLs.
54
+ - `wikipediaapi` for Wikipedia pages.
55
+ - `requests`, `readability-lxml`, and `html2text` for most other sources.
56
+ - `requests`, `beautifulsoup`, and `html2text` for BBC links.
57
+ """, unsafe_allow_html=True)
formatted_data.csv ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ Tool,Accuracy,Correct,Total,Mean Tokens Used,Mean Cost ($)
2
+ claude-prediction-offline,0.7201834862385321,157,218,779.4770642201835,0.006891669724770637
3
+ claude-prediction-online,0.6600660066006601,200,303,1505.3135313531352,0.013348171617161701
4
+ prediction-online,0.676737160120846,224,331,1219.6918429003022,0.001332990936555879
5
+ prediction-offline,0.6599326599326599,196,297,579.6565656565657,0.000621023569023569
6
+ prediction-online-summarized-info,0.6209150326797386,190,306,1008.4542483660131,0.0011213790849673195
7
+ prediction-offline-sme,0.599406528189911,202,337,1190.2017804154302,0.0013518635014836643
8
+ prediction-online-sme,0.5905044510385756,199,337,1834.919881305638,0.0020690207715133428
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ streamlit
2
+ pandas