cyberosa
disabling the run benchmark feature to fix the leaderboard
16d0da9
raw
history blame
4.59 kB
about_olas_predict_benchmark = """\
How good are LLMs at making predictions about events in the future? This is a topic that hasn't been well explored to date.
[Olas Predict](https://olas.network/services/prediction-agents) aims to rectify this by incentivizing the creation of agents that make predictions about future events (through prediction markets).
These agents are tested in the wild on real-time prediction market data, which you can see on [here](https://huggingface.co/datasets/valory/prediction_market_data) on HuggingFace (updated weekly).\
However, if you want to create an agent with new tools, waiting for real-time results to arrive is slow. This is where the Olas Predict Benchmark comes in. It allows devs to backtest new approaches on a historical event forecasting dataset (refined from [Autocast](https://arxiv.org/abs/2206.15474)) with high iteration speed.
πŸ—“ 🧐 The autocast dataset resolved-questions are from a timeline ending in 2022, so the models might be trained on some of these data. Thus the current reported accuracy measure might be an in-sample forecasting one.
However, we can learn about the relative strengths of the different approaches (e.g models and logic), before testing the most promising ones on real-time unseen data.
This HF Space showcases the performance of the various models and workflows (called tools in the Olas ecosystem) for making predictions, in terms of accuracy and cost.\
πŸ€— Pick a tool and run it on the benchmark using the "πŸ”₯ Run the Benchmark" page! (This feature is temporarily disabled due to an error in HF Spaces)
"""
about_the_tools = """\
- [Prediction Offline](https://github.com/valory-xyz/mech/blob/main/packages/valory/customs/prediction_request/prediction_request.py) - Uses prompt engineering, but no web crawling, to make predictions
- [Prediction Online](https://github.com/valory-xyz/mech/blob/main/packages/valory/customs/prediction_request/prediction_request.py) - Uses prompt engineering, as well as web crawling, to make predictions
- [Prediction with RAG](https://github.com/valory-xyz/mech/blob/main/packages/napthaai/customs/prediction_request_rag/prediction_request_rag.py) - Uses retrieval-augment-generation (RAG) over extracted search result to make predictions.
- [Prediction with Reasoning](https://github.com/valory-xyz/mech/blob/main/packages/napthaai/customs/prediction_request_reasoning/prediction_request_reasoning.py) - Incorporates an additional call to the LLM to do reasoning over retrieved data.
"""
about_the_dataset = """\
## Dataset Overview
This project leverages the Autocast dataset from the research paper titled ["Forecasting Future World Events with Neural Networks"](https://arxiv.org/abs/2206.15474).
The dataset has undergone further refinement to enhance the performance evaluation of Olas mech prediction tools.
Both the original and refined datasets are hosted on HuggingFace.
### Refined Dataset Files
- You can find the refined dataset on HuggingFace [here](https://huggingface.co/datasets/valory/autocast).
- `autocast_questions_filtered.json`: A JSON subset of the initial autocast dataset.
- `autocast_questions_filtered.pkl`: A pickle file mapping URLs to their respective scraped documents within the filtered dataset.
- `retrieved_docs.pkl`: Contains all the scraped texts.
### Filtering Criteria
To refine the dataset, we applied the following criteria to ensure the reliability of the URLs:
- URLs not returning HTTP 200 status codes are excluded.
- Difficult-to-scrape sites, such as Twitter and Bloomberg, are omitted.
- Links with less than 1000 words are removed.
- Only samples with a minimum of 5 and a maximum of 20 working URLs are retained.
### Scraping Approach
The content of the filtered URLs has been scraped using various libraries, depending on the source:
- `pypdf2` for PDF URLs.
- `wikipediaapi` for Wikipedia pages.
- `requests`, `readability-lxml`, and `html2text` for most other sources.
- `requests`, `beautifulsoup`, and `html2text` for BBC links.
"""
about_olas_predict = """\
Olas is a network of autonomous services that can run complex logic in a decentralized manner, interacting with on- and off-chain data autonomously and continuously. For other use cases check out [olas.network](https://olas.network/).
Since 'Olas' means 'waves' in Spanish, it is sometimes referred to as the 'ocean of services' 🌊.
The project is co-created by [Valory](https://www.valory.xyz/). Valory aspires to enable communities, organizations and countries to co-own AI systems, beginning with decentralized autonomous agents.
"""