Jianwei Yang's picture

9 5 14

Jianwei Yang

jw2yang

·

https://jwyang.github.io/

AI & ML interests

Computer Vision, Vision and Language, Machine Learning

Recent Activity

authored a paper 13 days ago

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

authored a paper 15 days ago

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

authored a paper 25 days ago

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

View all activity

Organizations

jw2yang's activity

upvoted a paper 3 months ago

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Paper • 2410.10818 • Published Oct 14 • 15

upvoted 2 papers 8 months ago

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25 • 55

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25 • 16

upvoted a paper 10 months ago

Pix2Gif: Motion-Guided Diffusion for GIF Generation

Paper • 2403.04634 • Published Mar 7 • 14

upvoted a paper about 1 year ago

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

Paper • 2310.11441 • Published Oct 17, 2023 • 26