WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published 12 days ago • 17
Hidden in the Noise: Two-Stage Robust Watermarking for Images Paper • 2412.04653 • Published Dec 5, 2024 • 28
Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity Paper • 2406.17720 • Published Jun 25, 2024 • 8
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification Paper • 2410.05057 • Published Oct 7, 2024 • 7
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking Paper • 2409.15268 • Published Sep 23, 2024 • 13