How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published 24 days ago • 49
Text-to-Image Base Models Collection All text-to-image open source base models, with their respective license • 28 items • Updated 9 days ago • 17
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models Paper • 2401.05252 • Published Jan 10 • 43
PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns Paper • 2312.04534 • Published Dec 7, 2023 • 6
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis Paper • 2311.08667 • Published Nov 15, 2023 • 18
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning Paper • 2311.07574 • Published Nov 13, 2023 • 13
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Paper • 2310.11441 • Published Oct 17, 2023 • 24
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 39
CLAP: Contrastive Language-Audio Pretraining Collection CLAP is to audio what CLIP is to image. • 5 items • Updated Oct 31, 2023 • 7
ProPainter: Improving Propagation and Transformer for Video Inpainting Paper • 2309.03897 • Published Sep 7, 2023 • 24
LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 70 items • Updated 3 days ago • 307
PaLI-3 Vision Language Models: Smaller, Faster, Stronger Paper • 2310.09199 • Published Oct 13, 2023 • 21
RealFill: Reference-Driven Generation for Authentic Image Completion Paper • 2309.16668 • Published Sep 28, 2023 • 11
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 446
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning Paper • 2309.07915 • Published Sep 14, 2023 • 4
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models Paper • 2308.04729 • Published Aug 9, 2023 • 29
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates Paper • 2307.05695 • Published Jul 11, 2023 • 21
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language Paper • 2306.16410 • Published Jun 28, 2023 • 26
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation Paper • 2306.07954 • Published Jun 13, 2023 • 111