1 21 10

Xinyu Fang

nebulae09

FangXinyu-0913

AI & ML interests

None yet

Recent Activity

updated a dataset about 8 hours ago

nebulae09/Creation-MMBench

published a dataset about 12 hours ago

nebulae09/Creation-MMBench

upvoted a paper 6 days ago

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

View all activity

Organizations

nebulae09's activity

upvoted 2 papers 6 days ago

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published 7 days ago • 53

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 138

upvoted a paper 20 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published 20 days ago • 69

upvoted 2 papers about 2 months ago

Redundancy Principles for MLLMs Benchmarks

Paper • 2501.13953 • Published Jan 20 • 28

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement

Paper • 2501.12273 • Published Jan 21 • 14

upvoted a paper 2 months ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 100

upvoted a paper 3 months ago

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 92

upvoted a paper 4 months ago

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Paper • 2411.15296 • Published Nov 22, 2024 • 20

upvoted 2 papers 5 months ago

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 60

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

Paper • 2410.12405 • Published Oct 16, 2024 • 13

upvoted a collection 5 months ago

LLaVA-Video

Collection

Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 8 items • Updated 24 days ago • 60

upvoted a paper 5 months ago

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Paper • 2410.05363 • Published Oct 7, 2024 • 45

upvoted a paper 6 months ago

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24, 2024 • 42

upvoted a paper 8 months ago

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16, 2024 • 44

upvoted 4 papers 9 months ago

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

Paper • 2406.17770 • Published Jun 25, 2024 • 19

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 74

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Paper • 2406.14544 • Published Jun 20, 2024 • 35

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20, 2024 • 33

upvoted an article 10 months ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14, 2024

• 243

upvoted an article 11 months ago

Article

Vision Language Models Explained

Apr 11, 2024

• 288