Salesforce/xgen-mm-vid-phi3-mini-r-v1.5-128tokens-8frames Image-Text-to-Text • Updated 11 days ago • 704 • 10
Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5 Image-Text-to-Text • Updated 11 days ago • 7.84k • 48
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22, 2024 • 36
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding Paper • 2212.05171 • Published Dec 10, 2022
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets Paper • 2406.18518 • Published Jun 26, 2024 • 24
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 98
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1, 2024 • 86
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning Paper • 2402.15506 • Published Feb 23, 2024 • 14
ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding Paper • 2305.08275 • Published May 14, 2023 • 2
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild Paper • 2305.11147 • Published May 18, 2023 • 3
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Paper • 2308.02151 • Published Aug 4, 2023 • 19
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents Paper • 2308.05960 • Published Aug 11, 2023 • 19
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation Paper • 2303.04991 • Published Mar 9, 2023
Align and Prompt: Video-and-Language Pre-training with Entity Prompts Paper • 2112.09583 • Published Dec 17, 2021