5 14 5

Yilun PRO

yilunzhao

AI & ML interests

None yet

Recent Activity

liked a model 3 minutes ago

efficientscaling/Z1-7B

authored a paper about 15 hours ago

Z1: Efficient Test-time Scaling with Code

upvoted a paper about 17 hours ago

Z1: Efficient Test-time Scaling with Code

View all activity

Organizations

yilunzhao's activity

upvoted a paper about 17 hours ago

Z1: Efficient Test-time Scaling with Code

Paper • 2504.00810 • Published 1 day ago • 17

upvoted a paper 3 days ago

PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving

Paper • 2503.21821 • Published 8 days ago • 16

upvoted a paper 7 days ago

MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search

Paper • 2503.20757 • Published 7 days ago • 9

upvoted a paper 12 days ago

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published 13 days ago • 80

upvoted a paper 22 days ago

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

Paper • 2503.07459 • Published 23 days ago • 15

upvoted a paper 26 days ago

IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval

Paper • 2503.04644 • Published 27 days ago • 20

upvoted a paper about 1 month ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 177

upvoted a paper about 2 months ago

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published Feb 13 • 192

upvoted 5 papers 2 months ago

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 85

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 112

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 368

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22 • 90

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

Paper • 2412.21199 • Published Dec 30, 2024 • 14

upvoted a paper 5 months ago

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published Oct 30, 2024 • 20