view post Post 383 AutoBench 1.0 is live. The Collective-LLM-as-a-Judge model benchmarkhttps://huggingface.co/blog/PeterKruger/autobench See translation
Article 5 Escape the Benchmark Trap: AutoBench โ the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)