To submit **a new benchmark** to the library:

1. Implement a new benchmark using some standard format (such as the [METR Task Standard](https://github.com/METR/task-standard)). This includes specifying the exact instructions for each tasks as well as the task environment that is provided inside the container the agent is run in.

2. We will encourage developers to support running their tasks on separate VMs and specify the exact hardware specifications for each task in the task environment.