Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,14 +7,10 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
<p align="center">
|
| 11 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/6462ac71514ee1645bd1f7f7/6MkoY412i9IqvISWSS4qs.png">
|
| 12 |
-
</p>
|
| 13 |
-
|
| 14 |
The rapid advancement of Large Language Models (LLMs) necessitates robust
|
| 15 |
and challenging benchmarks.
|
| 16 |
|
| 17 |
-
To address the challenge of ranking LLMs on
|
| 18 |
the **Language Model Council (LMC)** operates through a democratic process to: 1) formulate a test set through
|
| 19 |
equal participation, 2) administer the test among council members, and 3) evaluate
|
| 20 |
responses as a collective jury.
|
|
@@ -24,5 +20,6 @@ and less biased than those from any individual LLM judge, and is more consistent
|
|
| 24 |
|
| 25 |
Roadmap:
|
| 26 |
|
|
|
|
| 27 |
- Expand to more domains, use cases, and sophisticated agentic interactions.
|
| 28 |
- Produce a generalized user interface for Council-as-a-Service.
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
The rapid advancement of Large Language Models (LLMs) necessitates robust
|
| 11 |
and challenging benchmarks.
|
| 12 |
|
| 13 |
+
To address the challenge of ranking LLMs on highly subjective tasks such as emotional intelligence, creative writing, or persuasiveness,
|
| 14 |
the **Language Model Council (LMC)** operates through a democratic process to: 1) formulate a test set through
|
| 15 |
equal participation, 2) administer the test among council members, and 3) evaluate
|
| 16 |
responses as a collective jury.
|
|
|
|
| 20 |
|
| 21 |
Roadmap:
|
| 22 |
|
| 23 |
+
- Use the Council to benchmark evaluative characteristics of LLM-as-a-Judge/Jury like bias, affinity, and agreement.
|
| 24 |
- Expand to more domains, use cases, and sophisticated agentic interactions.
|
| 25 |
- Produce a generalized user interface for Council-as-a-Service.
|