Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published 1 day ago • 11
Customizing Text-to-Image Models with a Single Image Pair Paper • 2405.01536 • Published about 13 hours ago • 4
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge Paper • 2405.00263 • Published 2 days ago • 8
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Paper • 2405.00233 • Published 2 days ago • 9
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting Paper • 2404.19758 • Published 3 days ago • 8
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published 3 days ago • 14
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published 3 days ago • 51
Stylus: Automatic Adapter Selection for Diffusion Models Paper • 2404.18928 • Published 4 days ago • 12
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance Paper • 2401.16465 • Published Jan 29 • 5
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published 4 days ago • 21
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models Paper • 2404.17672 • Published 6 days ago • 15
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published 4 days ago • 50
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes Paper • 2404.17569 • Published 7 days ago • 10
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published 7 days ago • 29
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published 12 days ago • 25
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 11 days ago • 226
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published 13 days ago • 36
MeshLRM: Large Reconstruction Model for High-Quality Mesh Paper • 2404.12385 • Published 15 days ago • 23
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting Paper • 2404.09458 • Published 18 days ago • 6
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published 20 days ago • 58
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published 18 days ago • 27
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents Paper • 2404.05902 • Published 24 days ago • 20
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples Paper • 2404.07544 • Published 22 days ago • 15
JetMoE: Reaching Llama2 Performance with 0.1M Dollars Paper • 2404.07413 • Published 22 days ago • 32
Transferable and Principled Efficiency for Open-Vocabulary Segmentation Paper • 2404.07448 • Published 22 days ago • 8
Audio Dialogues: Dialogues dataset for audio and music understanding Paper • 2404.07616 • Published 22 days ago • 14
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published 23 days ago • 30
Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior Paper • 2404.06780 • Published 23 days ago • 9
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published 23 days ago • 90
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published 23 days ago • 21
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published 25 days ago • 23
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations Paper • 2404.04421 • Published 27 days ago • 14
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published 25 days ago • 55
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition Paper • 2404.02514 • Published 30 days ago • 9
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation Paper • 2404.02733 • Published 30 days ago • 19
Octopus v2: On-device language model for super agent Paper • 2404.01744 • Published about 1 month ago • 52
Condition-Aware Neural Network for Controlled Image Generation Paper • 2404.01143 • Published Apr 1 • 11
MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text Paper • 2404.00345 • Published Mar 30 • 15
InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion Paper • 2403.17422 • Published Mar 26 • 1
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models Paper • 2403.17005 • Published Mar 25 • 13
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers Paper • 2403.12943 • Published Mar 19 • 13
Transformers compatible Mamba Collection This release includes the `mamba` repositories compatible with the `transformers` library • 5 items • Updated Mar 6 • 25
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 171
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 92
RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches Paper • 2403.02709 • Published Mar 5 • 6
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets Paper • 2403.03194 • Published Mar 5 • 11