view article Article Docmatix - a huge dataset for Document Visual Question Answering 4 days ago • 43
Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform Paper • 2310.00036 • Published Sep 29, 2023 • 2
CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms Paper • 2111.08819 • Published Nov 16, 2021 • 2
Improve Mathematical Reasoning in Language Models by Automated Process Supervision Paper • 2406.06592 • Published Jun 5 • 17
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B Paper • 2406.07394 • Published Jun 11 • 17
SimPO Collection This collections contains a list of SimPO and baseline models. • 49 items • Updated 2 days ago • 9
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 15
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 87
Preference Datasets for DPO Collection This collection contains a list of curated preference datasets for DPO fine-tuning for intent alignment of LLMs • 7 items • Updated Apr 4 • 21
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Apr 22 • 76
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Paper • 2402.09844 • Published Feb 15 • 20
view article Article Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs Apr 16 • 11