3 18 3

Baifeng Shi

bfshi

https://bfshi.github.io

AI & ML interests

computer vision

Recent Activity

published a dataset 24 days ago

bfshi/vstar_bench_lmms_eval

new activity about 1 month ago

Efficient-Large-Model/NVILA-8B-Video:What is the difference between the nvila 8b base model and video model?

upvoted a paper about 1 month ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

View all activity

Organizations

bfshi's activity

published a dataset 24 days ago

bfshi/vstar_bench_lmms_eval

Viewer • Updated Nov 1, 2024 • 191 • 97

New activity in Efficient-Large-Model/NVILA-8B-Video about 1 month ago

What is the difference between the nvila 8b base model and video model?

#1 opened about 1 month ago by

YoungjaeDev

upvoted a paper about 1 month ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 108

upvoted a paper 2 months ago

An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published Jan 9 • 37

New activity in Efficient-Large-Model/NVILA-15B 2 months ago

Ask about demo

#1 opened 2 months ago by

Lanbai44

liked 2 models 3 months ago

Efficient-Large-Model/NVILA-15B

Text Generation • Updated Jan 6 • 48.6k • 12

Efficient-Large-Model/NVILA-8B

Text Generation • Updated Jan 6 • 39.5k • 4

upvoted a collection 3 months ago

NVILA

Collection

9 items • Updated 17 days ago • 9

liked a Space 3 months ago

VILA

🏆

VILA Playground.

authored a paper 3 months ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 58

upvoted a paper 3 months ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 58

updated a dataset 4 months ago

bfshi/vstar_bench_lmms_eval

Viewer • Updated Nov 1, 2024 • 191 • 97

upvoted a paper 4 months ago

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

Paper • 2410.22325 • Published Oct 29, 2024 • 10

upvoted 2 papers 5 months ago

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 67

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

Paper • 2410.01680 • Published Oct 2, 2024 • 34

upvoted 3 papers 7 months ago

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Paper • 2408.13257 • Published Aug 23, 2024 • 26

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 126

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 52

upvoted 2 papers 8 months ago

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 40

VideoGameBunny: Towards vision assistants for video games

Paper • 2407.15295 • Published Jul 21, 2024 • 22