The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 94
GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks Paper • 2406.12925 • Published Jun 14, 2024 • 25
Beyond Document Page Classification: Design, Datasets, and Challenges Paper • 2308.12896 • Published Aug 24, 2023 • 1