Vision Language Models Papers 🖼️💬📝 Collection Papers about vision-language models, most important ones are on top of the list. • 27 items • Updated 19 days ago • 26
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models Paper • 2404.04478 • Published Apr 6 • 11
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Paper • 2404.03653 • Published Apr 4 • 28
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Paper • 2403.14520 • Published Mar 21 • 31
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 92
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis Paper • 2401.17093 • Published Jan 30 • 18
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model Paper • 2401.02330 • Published Jan 4 • 11
Papers about model merging Collection referenced in the mergekit repo: https://github.com/cg123/mergekit • 4 items • Updated Feb 13 • 13
Photorealistic Video Generation with Diffusion Models Paper • 2312.06662 • Published Dec 11, 2023 • 22
Adaptive Shells for Efficient Neural Radiance Field Rendering Paper • 2311.10091 • Published Nov 16, 2023 • 16
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 73
Jointly Training Large Autoregressive Multimodal Models Paper • 2309.15564 • Published Sep 27, 2023 • 8