OpenMath Collection A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated Jan 17 • 42
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 91
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3, 2024 • 54
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 80
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning Paper • 2407.04078 • Published Jul 4, 2024 • 18
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1, 2024 • 77
Reward models on the hub Collection UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated Apr 13, 2024 • 25
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12, 2024 • 64
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning Paper • 2312.01552 • Published Dec 4, 2023 • 33
LiPO: Listwise Preference Optimization through Learning-to-Rank Paper • 2402.01878 • Published Feb 2, 2024 • 20
AgentTuning: Enabling Generalized Agent Abilities for LLMs Paper • 2310.12823 • Published Oct 19, 2023 • 35
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12, 2024 • 68
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 244
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 35