3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting Paper • 2405.18424 • Published 4 days ago • 6
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published 8 days ago • 41
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability Paper • 2405.14129 • Published 10 days ago • 9
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance Paper • 2405.14677 • Published 9 days ago • 8
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis Paper • 2405.14224 • Published 10 days ago • 8
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published 10 days ago • 14
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published 10 days ago • 10
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published 18 days ago • 22
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Paper • 2405.09215 • Published 18 days ago • 14
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published 19 days ago • 15
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts Paper • 2405.07518 • Published 20 days ago • 21
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 57
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1 • 18
Customizing Text-to-Image Models with a Single Image Pair Paper • 2405.01536 • Published about 1 month ago • 17
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge Paper • 2405.00263 • Published May 1 • 13
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Paper • 2405.00233 • Published Apr 30 • 12
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting Paper • 2404.19758 • Published Apr 30 • 9
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published Apr 30 • 19
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 65
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance Paper • 2401.16465 • Published Jan 29 • 9
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published Apr 29 • 26
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models Paper • 2404.17672 • Published Apr 26 • 17
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 63
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes Paper • 2404.17569 • Published Apr 26 • 10
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published Apr 25 • 31
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 26
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 238
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 37
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting Paper • 2404.09458 • Published Apr 15 • 6
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 27
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents Paper • 2404.05902 • Published Apr 8 • 20
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples Paper • 2404.07544 • Published Apr 11 • 15
Transferable and Principled Efficiency for Open-Vocabulary Segmentation Paper • 2404.07448 • Published Apr 11 • 10
Audio Dialogues: Dialogues dataset for audio and music understanding Paper • 2404.07616 • Published Apr 11 • 14
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published Apr 9 • 32
Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior Paper • 2404.06780 • Published Apr 10 • 9
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 93
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published Apr 10 • 22
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published Apr 8 • 23
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations Paper • 2404.04421 • Published Apr 5 • 14