17 30 227

Xin Li PRO

lixin4ever

https://lixin4ever.github.io/

lixin4ever

AI & ML interests

Natural Language Processing, Machine Learning

Recent Activity

liked a dataset 5 days ago

TIGER-Lab/VISTA-400K

liked a model 6 days ago

Qwen/QVQ-72B-Preview

liked a model 20 days ago

OpenGVLab/InternVL2_5-78B

View all activity

Organizations

lixin4ever's activity

upvoted a collection 21 days ago

PixMo

Collection

A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated Nov 27 • 51

upvoted a collection about 1 month ago

Inf-CL

Collection

The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated Oct 25 • 3

upvoted a collection about 2 months ago

OpenCoder Datasets

Collection

OpenCoder datasets! • 6 items • Updated Nov 15 • 37

upvoted a paper about 2 months ago

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published Oct 30 • 20

upvoted 4 papers 2 months ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22 • 89

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Paper • 2410.12490 • Published Oct 16 • 8

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17 • 89

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published Oct 16 • 30

upvoted 4 papers 3 months ago

upvoted a paper 4 months ago

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5 • 37

upvoted a paper 5 months ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29 • 55

upvoted 4 papers 7 months ago

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Paper • 2406.05132 • Published Jun 7 • 27

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 95

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12 • 39

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11 • 32

upvoted a paper 11 months ago

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Paper • 2402.03161 • Published Feb 5 • 14

upvoted a paper 12 months ago

A Vision Check-up for Language Models

Paper • 2401.01862 • Published Jan 3 • 9