4 6 5

Conghui He

conghui

AI & ML interests

None yet

Recent Activity

authored a paper 7 days ago

FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

authored a paper 7 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

authored a paper 18 days ago

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

View all activity

Organizations

None yet

conghui's activity

authored 2 papers 7 days ago

FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Paper • 2504.09925 • Published 8 days ago • 38

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 8 days ago • 237

authored a paper 18 days ago

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

Paper • 2504.02782 • Published 19 days ago • 55

authored a paper 25 days ago

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Paper • 2503.21758 • Published 26 days ago • 20

authored a paper 28 days ago

LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

Paper • 2503.17439 • Published Mar 21 • 15

authored 3 papers about 1 month ago

MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion

Paper • 2503.16212 • Published Mar 20 • 23

MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer

Paper • 2503.14891 • Published Mar 19 • 21

LEGION: Learning to Ground and Explain for Synthetic Image Detection

Paper • 2503.15264 • Published Mar 19 • 21

authored a paper 3 months ago

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published Jan 9 • 44

authored 4 papers 4 months ago

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Paper • 2412.11863 • Published Dec 16, 2024 • 4

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 99

Chimera: Improving Generalist Model with Domain-Specific Experts

Paper • 2412.05983 • Published Dec 8, 2024 • 9

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published Dec 10, 2024 • 22

upvoted a paper 4 months ago

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published Dec 10, 2024 • 22

authored a paper 4 months ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 155

authored a paper 5 months ago

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Paper • 2412.02592 • Published Dec 3, 2024 • 22

liked a dataset 5 months ago

opendatalab/OmniDocBench

Viewer • Updated Feb 11 • 984 • 1.78k • 22

liked a Space 5 months ago

308

MinerU

📚

Convert PDFs/images to Markdown and zip files

authored 2 papers 6 months ago

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 37

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 48