4 1

Yuandong Tian

tydsh

https://yuandong-tian.com/

AI & ML interests

Reinforcement Learning, Optimization, Representation Learning

Recent Activity

authored a paper 26 days ago

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

authored a paper 2 months ago

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

authored a paper 3 months ago

Towards General-Purpose Model-Free Reinforcement Learning

View all activity

Organizations

None yet

tydsh's activity

authored a paper 26 days ago

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

Paper • 2503.15478 • Published 27 days ago • 10

authored a paper 2 months ago

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Paper • 2502.03275 • Published Feb 5 • 17

authored 2 papers 3 months ago

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 29

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published Jan 18 • 15

authored a paper 4 months ago

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 82

authored a paper 9 months ago

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Paper • 2407.08296 • Published Jul 11, 2024 • 33

upvoted a collection 10 months ago

Llama 2 Family

Collection

This collection hosts the transformers and original repos of the Llama 2 and Llama Guard releases • 13 items • Updated Dec 6, 2024 • 80

authored 2 papers 12 months ago

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

Paper • 2404.16873 • Published Apr 21, 2024 • 30

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Paper • 2404.11912 • Published Apr 18, 2024 • 17

commented a paper about 1 year ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 189 •

authored 4 papers about 1 year ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 189

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 129

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 49

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

Paper • 2402.01622 • Published Feb 2, 2024 • 37

authored 2 papers over 1 year ago

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Paper • 2310.17157 • Published Oct 26, 2023 • 14

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

Paper • 2307.12950 • Published Jul 24, 2023 • 10

authored 4 papers almost 2 years ago

Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning

Paper • 2302.01578 • Published Feb 3, 2023

Extending Context Window of Large Language Models via Positional Interpolation

Paper • 2306.15595 • Published Jun 27, 2023 • 53

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Paper • 2306.14048 • Published Jun 24, 2023 • 12

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer

Paper • 2305.16380 • Published May 25, 2023 • 4