Human Alignment of Large Language Models through Online Preference Optimisation Paper • 2403.08635 • Published Mar 13
Unlocking the Power of Representations in Long-term Novelty-based Exploration Paper • 2305.01521 • Published May 2, 2023
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving Paper • 2405.12205 • Published May 20
Adapting to game trees in zero-sum imperfect information games Paper • 2212.12567 • Published Dec 23, 2022
Understanding Self-Predictive Learning for Reinforcement Learning Paper • 2212.03319 • Published Dec 6, 2022
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice Paper • 2305.13185 • Published May 22, 2023
A General Theoretical Paradigm to Understand Learning from Human Preferences Paper • 2310.12036 • Published Oct 18, 2023 • 14
Bootstrap your own latent: A new approach to self-supervised Learning Paper • 2006.07733 • Published Jun 13, 2020 • 2