Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published 7 days ago • 25
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark Paper • 2412.07825 • Published 16 days ago • 12
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated 20 days ago • 636
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 56
COCONut Dataset Collection This is a collection of COCONut datasets accepted at CVPR2024 • 3 items • Updated Apr 29 • 4
ViTamin: Designing Scalable Vision Models in the Vision-Language Era Paper • 2404.02132 • Published Apr 2 • 2
ViTamin Family Collection Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'. • 16 items • Updated Apr 11 • 8
Foundation AI Papers Collection Curated List of Must-Reads on LLM reasoning at Temus AI team • 135 items • Updated Jun 15 • 27