Spaces:

valory
/

olas-prediction-leaderboard

Running

App Files Files Community

arshy commited on Mar 26, 2024

Commit

83d3ac3

1 Parent(s): a8cdf7e

initial commit

Browse files

Files changed (4) hide show

.gitignore +1 -0
app.py +60 -0
formatted_data.csv +8 -0
requirements.txt +1 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ __pycache__

app.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import gradio as gr
+import pandas as pd
+csv_file_path = "formatted_data.csv"
+# Reading the CSV file
+df = pd.read_csv(csv_file_path)
+# Markdown text with HTML formatting for Gradio
+markdown_text = """
+## Benchmark Overview
+- The benchmark evaluates the performance of Olas Predict tools on the Autocast dataset.
+- The dataset has been refined to enhance the evaluation of the tools.
+- The leaderboard shows the performance of the tools based on the refined dataset.
+- The script to run the benchmark is available in the repo [here](https://github.com/valory-xyz/olas-predict-benchmark).
+## How to run your tools on the benchmark
+- Fork the repo [here](https://github.com/valory-xyz/olas-predict-benchmark).
+- Git init the submodules and update the submodule to get the latest dataset `mech` tool.
+    - `git submodule init`
+    - `git submodule update --remote --recursive`
+- Include your tool in the `mech/packages` directory accordingly.
+    - Guidelines on how to include your tool can be found [here](xxx).
+- Run the benchmark script.
+## Dataset Overview
+This project leverages the Autocast dataset from the research paper titled ["Forecasting Future World Events with Neural Networks"](https://arxiv.org/abs/2206.15474).
+The dataset has undergone further refinement to enhance the performance evaluation of Olas mech prediction tools.
+Both the original and refined datasets are hosted on HuggingFace.
+### Refined Dataset Files
+- You can find the refined dataset on HuggingFace [here](https://huggingface.co/datasets/valory/autocast).
+- `autocast_questions_filtered.json`: A JSON subset of the initial autocast dataset.
+- `autocast_questions_filtered.pkl`: A pickle file mapping URLs to their respective scraped documents within the filtered dataset.
+- `retrieved_docs.pkl`: Contains all the scraped texts.
+### Filtering Criteria
+To refine the dataset, we applied the following criteria to ensure the reliability of the URLs:
+- URLs not returning HTTP 200 status codes are excluded.
+- Difficult-to-scrape sites, such as Twitter and Bloomberg, are omitted.
+- Links with less than 1000 words are removed.
+- Only samples with a minimum of 5 and a maximum of 20 working URLs are retained.
+### Scraping Approach
+The content of the filtered URLs has been scraped using various libraries, depending on the source:
+- `pypdf2` for PDF URLs.
+- `wikipediaapi` for Wikipedia pages.
+- `requests`, `readability-lxml`, and `html2text` for most other sources.
+- `requests`, `beautifulsoup`, and `html2text` for BBC links.
+"""
+with gr.Blocks() as demo:
+    gr.Markdown("# Olas Predict Benchmark")
+    gr.Markdown("Leaderboard showing the performance of Olas Predict tools on the Autocast dataset and overview of the project.")
+    gr.DataFrame(df)
+    gr.Markdown(markdown_text)
+demo.launch()

formatted_data.csv ADDED Viewed

	@@ -0,0 +1,8 @@

+Tool,Accuracy,Correct,Total,Mean Tokens Used,Mean Cost ($)
+claude-prediction-offline,0.7201834862385321,157,218,779.4770642201835,0.006891669724770637
+claude-prediction-online,0.6600660066006601,200,303,1505.3135313531352,0.013348171617161701
+prediction-online,0.676737160120846,224,331,1219.6918429003022,0.001332990936555879
+prediction-offline,0.6599326599326599,196,297,579.6565656565657,0.000621023569023569
+prediction-online-summarized-info,0.6209150326797386,190,306,1008.4542483660131,0.0011213790849673195
+prediction-offline-sme,0.599406528189911,202,337,1190.2017804154302,0.0013518635014836643
+prediction-online-sme,0.5905044510385756,199,337,1834.919881305638,0.0020690207715133428

requirements.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ gradio