Spaces:
Sleeping
agents.mlevolve
Runs MLEvolve on a GraphTestbed task. MLEvolve is an MCGS auto-ML harness wired for OpenAI-compatible APIs.
Default model: gpt-5.3-codex-spark (a pipe-through alias you define in
your CLIProxyAPI oauth-model-alias.codex block).
Install
bash agents/mlevolve/install.sh
# heavy: clones the repo + pip-installs torch and ML deps (~5-10 GB).
Lands at agents/mlevolve/_vendor/MLEvolve/. Set MLEVOLVE_DIR if you
already have a clone elsewhere.
Run
gtb fetch figraph
python -m agents.mlevolve.runner --task figraph
Output:
runs/mlevolve/figraph/<timestamp>/
βββ mlebench-tree/figraph/
β βββ prepared/public/{train.csv,test.csv,description.md,sample_submission.csv}
β βββ prepared/private/test.csv # val labels β local grader uses this
β βββ REAL_TEST_FEATURES.csv # the actual test split, for re-execute
βββ agent.log
βββ val_submission.csv # MLEvolve's best on the val "test" split
β v1 limitation: val-as-test
GraphTestbed's actual test labels live on the scoring server, not on disk.
For the local mle-bench grader to function, the adapter exposes
val_features.csv (with labels) as the "test" set MLEvolve searches against.
The CSV the runner harvests is therefore predictions on val, not test. To submit a real test-set score:
- Open
agents/mlevolve/_vendor/MLEvolve/runs/<latest-ts>/and find the best runfile.py (search order: best score in the run's tree summary). - Re-execute it against the real test split:
cd <some scratch dir> cp <ws>/mlebench-tree/figraph/REAL_TEST_FEATURES.csv ./test.csv cp <ws>/mlebench-tree/figraph/prepared/public/train.csv ./train.csv python <runfile> # produces submission.csv - Submit:
gtb submit figraph --file ./submission.csv --agent mlevolve-codex-spark
This step is manual in v1 because the structure of MLEvolve's runfile.py
varies per task and we don't want to silently mis-execute. It is on the
roadmap to automate.
Knobs
| flag | default | meaning |
|---|---|---|
--model |
gpt-5.3-codex-spark |
sent to proxy via OPENAI_BASE_URL/v1 |
--steps |
100 | MCGS exploration count (upstream default: 500) |
--time-limit-min |
120 | per-task wall-clock cap (upstream default: 720) |
--gpus |
0 | passed to search.num_gpus |
The --model string must exist in your CLIProxyAPI
oauth-model-alias.codex (or be a real model your Codex account exposes).