Scaling Diffusion Transformers to 16 Billion Parameters Paper • 2407.11633 • Published 11 days ago • 22
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions Paper • 2407.06723 • Published 18 days ago • 9
Inference Performance Optimization for Large Language Models on CPUs Paper • 2407.07304 • Published 17 days ago • 48
CommonCatalog Collection Common Catalog, a dataset with Creative Commons licensed images and machine-generated caption pairs • 8 items • Updated May 16 • 13
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 149
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12 • 24
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 62
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3 • 42
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Paper • 2405.21048 • Published May 31 • 11
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images Paper • 2310.16825 • Published Oct 25, 2023 • 30
Trained on AWS Trainium Collection Collection of models on Hugging Face that have been trained on AWS Trainium. Learn more here: https://huggingface.co/docs/optimum-neuron/index • 7 items • Updated May 7 • 6
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings Paper • 2404.16820 • Published Apr 25 • 15
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 243
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation Paper • 2404.13026 • Published Apr 19 • 21
On the Scalability of Diffusion-based Text-to-Image Generation Paper • 2404.02883 • Published Apr 3 • 17
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models Paper • 2404.01367 • Published Apr 1 • 19
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30 • 40
Are We on the Right Way for Evaluating Large Vision-Language Models? Paper • 2403.20330 • Published Mar 29 • 6
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 123
Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings Paper • 2403.07750 • Published Mar 12 • 19
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 180
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20 • 20
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices Paper • 2311.16567 • Published Nov 28, 2023 • 22
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 48
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 85
VideoPoet: A Large Language Model for Zero-Shot Video Generation Paper • 2312.14125 • Published Dec 21, 2023 • 43
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis Paper • 2312.13834 • Published Dec 20, 2023 • 26
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 67
StarVector: Generating Scalable Vector Graphics Code from Images Paper • 2312.11556 • Published Dec 17, 2023 • 26
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing Paper • 2312.07409 • Published Dec 12, 2023 • 22
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations Paper • 2312.04655 • Published Dec 7, 2023 • 19
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior Paper • 2312.06655 • Published Dec 11, 2023 • 21
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Paper • 2312.06585 • Published Dec 11, 2023 • 27
Evaluation of Large Language Models for Decision Making in Autonomous Driving Paper • 2312.06351 • Published Dec 11, 2023 • 5
Photorealistic Video Generation with Diffusion Models Paper • 2312.06662 • Published Dec 11, 2023 • 23
Efficient Quantization Strategies for Latent Diffusion Models Paper • 2312.05431 • Published Dec 9, 2023 • 11
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models Paper • 2312.05107 • Published Dec 8, 2023 • 36
Analyzing and Improving the Training Dynamics of Diffusion Models Paper • 2312.02696 • Published Dec 5, 2023 • 31
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Paper • 2311.13231 • Published Nov 22, 2023 • 25
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs Paper • 2311.09257 • Published Nov 14, 2023 • 43
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper • 2311.06783 • Published Nov 12, 2023 • 26
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving Paper • 2311.05332 • Published Nov 9, 2023 • 7
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 78
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper • 2311.04145 • Published Nov 7, 2023 • 31
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 40