2 1

Sainbayar Sukhbaatar

sainbar

https://tesatory.github.io/

AI & ML interests

None yet

Recent Activity

authored a paper 20 days ago

Multi-Token Attention

authored a paper about 1 month ago

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

authored a paper 3 months ago

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

View all activity

Organizations

None yet

sainbar's activity

authored a paper 20 days ago

Multi-Token Attention

Paper • 2504.00927 • Published 21 days ago • 45

authored a paper about 1 month ago

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

Paper • 2503.15478 • Published Mar 19 • 10

authored a paper 3 months ago

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published Jan 18 • 15

authored a paper 4 months ago

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 83

authored a paper 5 months ago

Adaptive Decoding via Latent Preference Optimization

Paper • 2411.09661 • Published Nov 14, 2024 • 10

authored a paper 6 months ago

Thinking LLMs: General Instruction Following with Thought Generation

Paper • 2410.10630 • Published Oct 14, 2024 • 19

upvoted a paper 9 months ago

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Paper • 2407.19594 • Published Jul 28, 2024 • 21

commented a paper 9 months ago

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Paper • 2407.19594 • Published Jul 28, 2024 • 21 •

authored 12 papers 9 months ago

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

Paper • 2304.11063 • Published Apr 18, 2023

System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 43

Hash Layers For Large Sparse Models

Paper • 2106.04426 • Published Jun 8, 2021 • 2

Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss

Paper • 2312.16682 • Published Dec 27, 2023 • 5

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 148

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 49

Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7, 2024 • 51

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12, 2024 • 42