Kai Zuberbühler

kaizuberbuehler

k-zubi

AI & ML interests

language models, agents, image generation, music generation

Recent Activity

updated a collection 5 days ago

Leaderboards

upvoted an article 5 days ago

Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios

updated a collection 7 days ago

Benchmarks

View all activity

Organizations

None yet

kaizuberbuehler's activity

upvoted an article 5 days ago

Article

Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios

and 1 other •

17 days ago

• 15

upvoted 9 papers 7 days ago

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published 9 days ago • 17

ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models

Paper • 2502.09696 • Published 16 days ago • 38

The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

Paper • 2502.08235 • Published 17 days ago • 54

Large Language Diffusion Models

Paper • 2502.09992 • Published 15 days ago • 95

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Paper • 2502.09411 • Published 16 days ago • 17

upvoted 10 papers 8 days ago

Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges

Paper • 2502.08680 • Published 17 days ago • 11

CoT-Valve: Length-Compressible Chain-of-Thought Tuning

Paper • 2502.09601 • Published 16 days ago • 14

mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data

Paper • 2502.08468 • Published 17 days ago • 13

Typhoon T1: An Open Thai Reasoning Model

Paper • 2502.09042 • Published 16 days ago • 16

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

Paper • 2502.09390 • Published 16 days ago • 16

Logical Reasoning in Large Language Models: A Survey

Paper • 2502.09100 • Published 16 days ago • 22

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Paper • 2502.09620 • Published 16 days ago • 25

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Paper • 2502.09621 • Published 16 days ago • 27

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

Paper • 2502.09082 • Published 16 days ago • 27

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Paper • 2502.09604 • Published 16 days ago • 32