HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published Jun 27 • 59 • 9
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22 • 45 • 8
Evaluating D-MERIT of Partial-annotation on Information Retrieval Paper • 2406.16048 • Published Jun 23 • 34 • 2
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24 • 54 • 3
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published Jun 6 • 36 • 3
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published Jun 6 • 26 • 2
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6 • 71 • 4
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning Paper • 2406.00392 • Published Jun 1 • 12 • 1
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation Paper • 2406.00908 • Published Jun 3 • 11 • 1
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback Paper • 2406.00888 • Published Jun 2 • 30 • 1
Learning Temporally Consistent Video Depth from Video Diffusion Priors Paper • 2406.01493 • Published Jun 3 • 17 • 2
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3 • 42 • 3
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published May 28 • 18 • 2
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published May 28 • 20 • 3
Part123: Part-aware 3D Reconstruction from a Single-view Image Paper • 2405.16888 • Published May 27 • 10 • 1
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27 • 51 • 2
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24 • 43 • 7
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published May 23 • 32 • 6
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published May 23 • 16 • 11
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published May 22 • 22 • 5
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53 • 8
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 45 • 10
Chameleon: Mixed-Modal Early-Fusion Foundation Models Paper • 2405.09818 • Published May 16 • 125 • 12
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Paper • 2405.09215 • Published May 15 • 18 • 1
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published May 15 • 24 • 1
Compositional Text-to-Image Generation with Dense Blob Representations Paper • 2405.08246 • Published May 14 • 12 • 1
SUTRA: Scalable Multilingual Language Model Architecture Paper • 2405.06694 • Published May 7 • 37 • 2
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 118 • 9
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 114 • 11
STT: Stateful Tracking with Transformers for Autonomous Driving Paper • 2405.00236 • Published Apr 30 • 7 • 2
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Paper • 2404.18212 • Published Apr 28 • 27 • 6
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings Paper • 2404.16820 • Published Apr 25 • 15 • 2
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 57 • 9
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25 • 53 • 5
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 124 • 14
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38 • 9
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 251 • 41
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting Paper • 2404.06903 • Published Apr 10 • 17 • 3
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 124 • 12
StableDrag: Stable Dragging for Point-based Image Editing Paper • 2403.04437 • Published Mar 7 • 25 • 4
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 32 • 3
CodePlan: Repository-level Coding using LLMs and Planning Paper • 2309.12499 • Published Sep 21, 2023 • 73 • 14
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering Paper • 2311.12775 • Published Nov 21, 2023 • 28 • 3
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation Paper • 2311.07562 • Published Nov 13, 2023 • 12 • 1
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation Paper • 2311.01455 • Published Nov 2, 2023 • 28 • 2
HyperFields: Towards Zero-Shot Generation of NeRFs from Text Paper • 2310.17075 • Published Oct 26, 2023 • 14 • 2
3D-GPT: Procedural 3D Modeling with Large Language Models Paper • 2310.12945 • Published Oct 19, 2023 • 57 • 2