AI & ML interests

None defined yet.

Recent Activity

TGalanos  updated a Space 1 day ago
aec-bench/README
TGalanos  published a Space 1 day ago
aec-bench/README
TGalanos  published a dataset 1 day ago
aec-bench/release-model-rollouts
View all activity

Organization Card

AEC-Bench

AEC-Bench is an open benchmark and Python toolkit for evaluating agentic AI systems on realistic Architecture, Engineering, and Construction tasks.

The project combines generated engineering tasks, executable verifiers, model rollout ledgers, and trace artifacts so evaluation can be inspected beyond a single leaderboard score: by task family, difficulty, information visibility, tool use, cost, and failure mode.

models 0

None public yet