Motoki Wu's picture

Motoki Wu

tokestermw

·

https://motoki.co

AI & ML interests

None yet

Recent Activity

upvoted a paper about 4 hours ago

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

upvoted a paper 1 day ago

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

liked a model 2 days ago

mistralai/Mistral-Small-3.1-24B-Base-2503

View all activity

Organizations

tokestermw's activity

upvoted a paper about 4 hours ago

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

Paper • 2503.16219 • Published 3 days ago • 27

upvoted a paper 1 day ago

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Paper • 2503.16419 • Published 3 days ago • 52

upvoted a paper 4 days ago

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published 5 days ago • 91

upvoted a collection 10 days ago

Gemma 3 Release

9 items • Updated 10 days ago • 286

upvoted an article 10 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

12 days ago

• 338

upvoted a collection 17 days ago

Light-R1

Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond • 7 items • Updated 10 days ago • 11

upvoted a collection 19 days ago

Hallucination detection

Trained ModernBERT (base and large) for detection hallucinations in LLM responses. The models are trained as token classifications. • 4 items • Updated 18 days ago • 15

upvoted a paper 23 days ago

Rank1: Test-Time Compute for Reranking in Information Retrieval

Paper • 2502.18418 • Published 26 days ago • 26

upvoted 2 papers 25 days ago

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Paper • 2502.16894 • Published 27 days ago • 28

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published 26 days ago • 70

upvoted 3 papers 27 days ago

Expect the Unexpected: FailSafe Long Context QA for Finance

Paper • 2502.06329 • Published Feb 10 • 126

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback

Paper • 2502.15027 • Published about 1 month ago • 7

SIFT: Grounding LLM Reasoning in Contexts via Stickers

Paper • 2502.14922 • Published Feb 19 • 30

upvoted a collection 27 days ago

Sky-T1-7B

A series of 7B models trained with different recipes and the corresponding training data. • 8 items • Updated Feb 14 • 6

upvoted a collection about 1 month ago

Process Reward Models

Model and Datasets for Qwen 2.5 Math PRM 7B • 6 items • Updated Feb 18 • 2

upvoted 4 papers about 1 month ago

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14 • 32

Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 46

Agency Is Frame-Dependent

Paper • 2502.04403 • Published Feb 6 • 22

ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Paper • 2502.04689 • Published Feb 7 • 7