SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models Paper • 2311.08370 • Published Nov 14, 2023
FinanceBench: A New Benchmark for Financial Question Answering Paper • 2311.11944 • Published Nov 20, 2023
GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking Paper • 2412.14140 • Published Dec 18, 2024 • 1
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning Paper • 2503.19193 • Published 24 days ago • 1
PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-v1.1 Text Generation • Updated Jul 31, 2024 • 1.77k • 10
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 661
view article Article Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases Jan 31, 2024 • 3