44 23 20

vansin

AI & ML interests

None yet

Recent Activity

upvoted a paper about 16 hours ago

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

posted an update about 21 hours ago

🔥MedAgentBench Amazing Work🚀 Just explored #MedAgentBench from @Yale researchers and it's mind-blowing! They've created a cutting-edge benchmark that finally exposes the true capabilities of LLMs in complex medical reasoning. ⚡ Key discoveries: DeepSeek R1 & OpenAI O3 dominate clinical reasoning tasks Agent-based frameworks deliver exceptional performance-cost balance Open-source alternatives are closing the gap at fraction of the cost This work shatters previous benchmarks that failed to challenge today's advanced models. The future of medical AI is here: https://github.com/gersteinlab/medagents-benchmark #MedicalAI #MachineLearning #AIinHealthcare 🔥

commented on a paper 1 day ago

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

View all activity

Organizations

vansin's activity

upvoted a paper about 16 hours ago

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published 11 days ago • 210

upvoted a paper 1 day ago

MinorBench: A hand-built benchmark for content-based risks for children

Paper • 2503.10242 • Published 3 days ago • 4

upvoted a paper 2 days ago

Charting and Navigating Hugging Face's Model Atlas

Paper • 2503.10633 • Published 3 days ago • 52

upvoted a paper 4 days ago

VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering

Paper • 2503.06492 • Published 8 days ago • 9

upvoted a paper 5 days ago

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Paper • 2503.08638 • Published 5 days ago • 56

upvoted 2 papers about 1 month ago

SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

Paper • 2502.08168 • Published Feb 12 • 12

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published Feb 10 • 60

upvoted 2 collections about 1 month ago

OREAL

Collection

7 items • Updated Feb 11 • 9

InternLM3

Collection

6 items • Updated Feb 11 • 24

upvoted an article about 2 months ago

Article

State of open video generation models in Diffusers

Jan 27

• 50

upvoted 2 papers 3 months ago

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 92

OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Paper • 2412.13018 • Published Dec 17, 2024 • 41

upvoted 3 collections 3 months ago

upvoted 2 papers 3 months ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 138

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Paper • 2407.20183 • Published Jul 29, 2024 • 42

upvoted a paper 5 months ago

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 60

upvoted 2 papers 6 months ago

Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30, 2024 • 54

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24, 2024 • 42