Submitted by akhaliq 68 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks · 9 authors 6
Submitted by akhaliq 34 JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models · 12 authors 1
Submitted by akhaliq 28 Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model · 10 authors 3
Submitted by akhaliq 26 Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs · 7 authors 2
Submitted by akhaliq 17 Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization · 14 authors 1
Submitted by akhaliq 11 FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores · 4 authors 1
Submitted by akhaliq 8 ADaPT: As-Needed Decomposition and Planning with Language Models · 7 authors 1
Submitted by akhaliq 6 Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities · 6 authors 1
Submitted by akhaliq 5 Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems · 8 authors 1