RLVR Collection Model and data for 'Expanding RL with Verifiable Rewards Across Diverse Domains' • 3 items • Updated 18 days ago • 11
Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published 15 days ago • 52
Video-Guided Foley Sound Generation with Multimodal Controls Paper • 2411.17698 • Published Nov 26, 2024 • 10
DITTO: Diffusion Inference-Time T-Optimization for Music Generation Paper • 2401.12179 • Published Jan 22, 2024 • 22
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4, 2024 • 98
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 88
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25, 2024 • 60
GPT-4 generated datasets Collection Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs. • 18 items • Updated Apr 16, 2024 • 10
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Paper • 2403.08764 • Published Mar 13, 2024 • 37
Awesome SDXL LoRAs Collection A curated set of amazing Stable Diffusion XL LoRAs (they power the LoRA the Explorer Space) • 36 items • Updated Jun 24, 2024 • 20
LLaMA Beyond English: An Empirical Study on Language Capability Transfer Paper • 2401.01055 • Published Jan 2, 2024 • 56
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance Paper • 2312.11396 • Published Dec 18, 2023 • 11
FreeInit: Bridging Initialization Gap in Video Diffusion Models Paper • 2312.07537 • Published Dec 12, 2023 • 27
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models Paper • 2312.06109 • Published Dec 11, 2023 • 21
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models Paper • 2312.02969 • Published Dec 5, 2023 • 15