Haoning Wu, Teo PRO

teowu

https://teowu.github.io

AI & ML interests

Lead of Q-Future: https://github.com/Q-Future. I love MLLMs/LMMs/LVLMs/(any names you call them). Core builder of Aria (best open-source video LMM).

Recent Activity

liked a model 4 days ago

Aria-UI/Aria-UI-base

liked a dataset 4 days ago

LongVideos/LongVideoDB-373K-Videos

updated a dataset 4 days ago

LongVideos/LongVideoDB-373K-Videos

View all activity

Organizations

teowu's activity

upvoted a paper 24 days ago

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Paper • 2412.05237 • Published 28 days ago • 46

upvoted 3 papers about 1 month ago

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

Paper • 2412.00927 • Published Dec 1, 2024 • 26

Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15, 2024 • 23

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Paper • 2410.03051 • Published Oct 4, 2024 • 4

upvoted a paper 3 months ago

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 107

upvoted a paper 5 months ago

LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

Paper • 2407.15754 • Published Jul 22, 2024 • 20

upvoted a paper 7 months ago

CMC-Bench: Towards a New Paradigm of Visual Signal Compression

Paper • 2406.09356 • Published Jun 13, 2024 • 4

upvoted a collection 7 months ago

Visual Evaluation Benchmarks!

Collection

Q-Bench (ICLR24' Spotlight), Q-Bench-Pair (TPAMI), and A-Bench in HuggingFace Format. Support auto-load as `dataset = load_dataset("q-future/**-HF")` • 3 items • Updated Aug 27, 2024 • 1

upvoted 3 papers 7 months ago

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

Paper • 2406.03070 • Published Jun 5, 2024 • 2

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published May 29, 2024 • 46

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 14

upvoted an article 8 months ago

Article

Let's talk about LLM evaluation

•

May 23, 2024

• 143

upvoted a paper 8 months ago

MANTIS: Interleaved Multi-Image Instruction Tuning

Paper • 2405.01483 • Published May 2, 2024 • 6

upvoted a collection 9 months ago

Idefics2 🐶

Collection

Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6, 2024 • 91

upvoted 3 papers 10 months ago

HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 25

Towards Open-ended Visual Quality Comparison

Paper • 2402.16641 • Published Feb 26, 2024 • 16

A Benchmark for Multi-modal Foundation Models on Low-level Vision: from Single Images to Pairs

Paper • 2402.07116 • Published Feb 11, 2024 • 2

upvoted 3 collections 11 months ago