PeterKruger (Peter Kruger)

commented on Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) about 18 hours ago

Nice and fully accurate. Excellent job. Thanks!

New activity in AutoBench/AutoBench_1.0 2 days ago

Comparing with mt-bench

#3 opened 2 days ago by

PeterKruger

posted an update 2 days ago

Post

397

AutoBench 1.0 is live. The Collective-LLM-as-a-Judge model benchmark
https://huggingface.co/blog/PeterKruger/autobench

New activity in AutoBench/AutoBench_1.0 2 days ago

Pool LLM bias

#2 opened 2 days ago by

PeterKruger

Prompt analysis should be better discussed

#1 opened 2 days ago by

PeterKruger

upvoted an article 2 days ago

Article

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

By

•

2 days ago

• 5

liked a Space 2 days ago

1

AutoBench 1.0 Demo

🐠

Collective-Model-As-Judge LLM Benchmark

liked a model 2 days ago

AutoBench/AutoBench_1.0

Updated 2 days ago • 2

updated a Space 2 days ago

1

AutoBench 1.0 Demo

🐠

Collective-Model-As-Judge LLM Benchmark

published an article 2 days ago

Article

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

By

•

2 days ago

• 5

updated a model 2 days ago

AutoBench/AutoBench_1.0

Updated 2 days ago • 2

updated a dataset 2 days ago

AutoBench/AutoBench_Results_20_LLMs

Preview • Updated 2 days ago • 4

published a dataset 2 days ago

AutoBench/AutoBench_Results_20_LLMs

Preview • Updated 2 days ago • 4

updated a Space 2 days ago

README

😻

published 2 Spaces 2 days ago

README

😻

1

AutoBench 1.0 Demo

🐠

Collective-Model-As-Judge LLM Benchmark

published a model 2 days ago

AutoBench/AutoBench_1.0

Updated 2 days ago • 2

updated a Space 3 days ago

1

AutoBench 1.0 Demo

🐠

Collective-Model-As-Judge LLM Benchmark

updated 2 models 3 days ago

AutoBench/AutoBench_1.0

Updated 2 days ago • 2

AutoBench/AutoBench_1.0

Updated 2 days ago • 2

Peter Kruger PRO

AI & ML interests

Recent Activity

Organizations

PeterKruger's activity

Comparing with mt-bench

Pool LLM bias

Prompt analysis should be better discussed

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

AutoBench 1.0 Demo

AutoBench/AutoBench_1.0

AutoBench 1.0 Demo

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

AutoBench/AutoBench_1.0

AutoBench/AutoBench_Results_20_LLMs

AutoBench/AutoBench_Results_20_LLMs

README

README

AutoBench 1.0 Demo

AutoBench/AutoBench_1.0

AutoBench 1.0 Demo

AutoBench/AutoBench_1.0

AutoBench/AutoBench_1.0