Running
1
Refusals Leaderboard
👀
Refusals by GPT-4o, o1-mini, o1-preview, Claude 3.5 Sonnet
Evals
Mandoline helps developers evaluate and improve LLM applications in ways that matter to users.
Create custom metrics that align with your specific use case, evaluate LLM performance in real situations, and track improvements over time.