Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries Paper • 2409.12640 • Published Sep 19, 2024 • 2
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24, 2024 • 42