Ivy Zhang's picture

Ivy Zhang

Ivy1997

·

AI & ML interests

None yet

Recent Activity

liked a Space 4 days ago

opencompass/openvlm_video_leaderboard

liked a model 4 days ago

OpenGVLab/InternVideo2_5_Chat_8B

liked a model 6 days ago

black-forest-labs/FLUX.1-dev

View all activity

Organizations

Ivy1997's activity

upvoted a collection 2 months ago

Qwen2-VL

Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 210

upvoted 3 papers 2 months ago

EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion

Paper • 2501.13452 • Published Jan 23 • 7

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23 • 22

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23 • 26

upvoted a collection 3 months ago

AIMv2

A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 74

upvoted a paper 3 months ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 14

upvoted 9 papers 4 months ago

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Paper • 2411.14794 • Published Nov 22, 2024 • 13

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 63

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 26

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Paper • 2411.14199 • Published Nov 21, 2024 • 32

Hymba: A Hybrid-head Architecture for Small Language Models

Paper • 2411.13676 • Published Nov 20, 2024 • 45

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 47

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Paper • 2411.14405 • Published Nov 21, 2024 • 62

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 81

Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

Paper • 2410.21220 • Published Oct 28, 2024 • 10