Yang's picture

19

Yang

XaiverYang

RFKxavieryang

AI & ML interests

None yet

Organizations

None yet

XaiverYang's activity

upvoted 6 papers 5 months ago

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

Paper • 2411.14522 • Published Nov 21, 2024 • 39

SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation

Paper • 2411.14525 • Published Nov 21, 2024 • 22

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Paper • 2411.12814 • Published Nov 19, 2024 • 26

Material Anything: Generating Materials for Any 3D Object via Diffusion

Paper • 2411.15138 • Published Nov 22, 2024 • 51

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Paper • 2411.13503 • Published Nov 20, 2024 • 35

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

Paper • 2410.23287 • Published Oct 30, 2024 • 19

upvoted 6 papers 6 months ago

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Paper • 2410.17856 • Published Oct 23, 2024 • 52

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 69

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 97

HumanEval-V: Benchmarking High-Level Visual Reasoning with Complex Diagrams in Coding Tasks

Paper • 2410.12381 • Published Oct 16, 2024 • 45

Think While You Generate: Discrete Diffusion with Planned Denoising

Paper • 2410.06264 • Published Oct 8, 2024 • 11

ControlAR: Controllable Image Generation with Autoregressive Models

Paper • 2410.02705 • Published Oct 3, 2024 • 11

upvoted 3 papers 7 months ago

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

Paper • 2409.13592 • Published Sep 20, 2024 • 52

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 75

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Paper • 2409.07146 • Published Sep 11, 2024 • 21

upvoted 2 papers 8 months ago

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12, 2024 • 124

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37

upvoted 2 papers 9 months ago

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10, 2024 • 33

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Paper • 2407.01392 • Published Jul 1, 2024 • 46