Tianyu Pang's picture

20 10

Tianyu Pang

P2333

·

https://p2333.github.io/

P2333

AI & ML interests

Machine Learning

Recent Activity

authored a paper about 5 hours ago

FlowReasoner: Reinforcing Query-Level Meta-Agents

authored a paper 2 days ago

A Recipe for Watermarking Diffusion Models

authored a paper 2 days ago

Adversarial Attacks and Defences Competition

View all activity

Organizations

None yet

P2333's activity

upvoted a collection 4 days ago

🚀 Active PRM

Efficient Process Reward Model Training via Active Learning. • 4 items • Updated 7 days ago • 3

upvoted 2 papers 4 days ago

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Paper • 2412.18605 • Published Dec 24, 2024 • 21

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Paper • 2504.13055 • Published 5 days ago • 18

upvoted a collection 4 days ago

NoisyRollout

6 items • Updated 2 days ago • 5

upvoted a paper 6 days ago

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published 8 days ago • 13

upvoted a paper 19 days ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published 27 days ago • 44

upvoted a collection 26 days ago

🌾Oat-Zero: Understanding R1-Zero-Like Training

5 items • Updated 12 days ago • 7

upvoted 4 collections 5 months ago

⚓️ Sailor Language Models

Sailor: Open Language Models tailored for South-East Asia (SEA) released by Sea AI Lab. • 17 items • Updated Dec 3, 2024 • 17

📈 Scaling Laws with Vocabulary

Increase your vocabulary size when you scale up your language model • 5 items • Updated Aug 11, 2024 • 6

🧬 RegMix: Data Mixture as Regression

Automatic data mixture method for large language model pre-training • 10 items • Updated Jul 26, 2024 • 8

🔱 Sailor2 Language Models

Sailing in South-East Asia with Inclusive Multilingual LLMs • 34 items • Updated Feb 24 • 27

upvoted a paper 6 months ago

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

Paper • 2410.07137 • Published Oct 9, 2024 • 7

upvoted a collection 10 months ago

💡 DICE

Self-alignment with DPO Implicit Rewards • 5 items • Updated Jul 28, 2024 • 9

upvoted a paper 10 months ago

Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14, 2024 • 41

upvoted a paper about 1 year ago

Weak-to-Strong Jailbreaking on Large Language Models

Paper • 2401.17256 • Published Jan 30, 2024 • 16

upvoted 5 papers over 1 year ago

Zero Bubble Pipeline Parallelism

Paper • 2401.10241 • Published Nov 30, 2023 • 25

Better Diffusion Models Further Improve Adversarial Training

Paper • 2302.04638 • Published Feb 9, 2023 • 1

LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

Paper • 2307.13269 • Published Jul 25, 2023 • 32

Efficient Diffusion Policies for Offline Reinforcement Learning

Paper • 2305.20081 • Published May 31, 2023 • 2

Bag of Tricks for Training Data Extraction from Language Models

Paper • 2302.04460 • Published Feb 9, 2023 • 2