3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting Paper • 2405.18424 • Published 1 day ago • 4
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published 1 day ago • 8
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections Paper • 2405.17991 • Published 1 day ago • 7
LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models Paper • 2405.18377 • Published 1 day ago • 7
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters Paper • 2405.16287 • Published 4 days ago • 8
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models Paper • 2405.16537 • Published 4 days ago • 13
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published 2 days ago • 6
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer Paper • 2405.17405 • Published 2 days ago • 10
Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models Paper • 2405.16759 • Published 3 days ago • 6
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models Paper • 2405.17428 • Published 2 days ago • 11
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published 2 days ago • 35
Part123: Part-aware 3D Reconstruction from a Single-view Image Paper • 2405.16888 • Published 3 days ago • 8
Looking Backward: Streaming Video-to-Video Translation with Feature Banks Paper • 2405.15757 • Published 5 days ago • 11
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels Paper • 2405.16822 • Published 3 days ago • 9
Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning Paper • 2405.17258 • Published 2 days ago • 11
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published 5 days ago • 41
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published 6 days ago • 17
HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting Paper • 2405.15125 • Published 6 days ago • 4
iVideoGPT: Interactive VideoGPTs are Scalable World Models Paper • 2405.15223 • Published 6 days ago • 11
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published 5 days ago • 10
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published 6 days ago • 43
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner Paper • 2405.14979 • Published 6 days ago • 12
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct Paper • 2405.14906 • Published 7 days ago • 18
Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining Paper • 2405.14908 • Published 7 days ago • 10
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published 6 days ago • 20
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition Paper • 2405.15216 • Published 6 days ago • 10
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published 6 days ago • 29
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance Paper • 2405.14677 • Published 6 days ago • 8
Improved Distribution Matching Distillation for Fast Image Synthesis Paper • 2405.14867 • Published 6 days ago • 10
CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers Paper • 2405.13195 • Published 8 days ago • 6
Semantica: An Adaptable Image-Conditioned Diffusion Model Paper • 2405.14857 • Published 6 days ago • 6
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published 7 days ago • 27
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published 7 days ago • 14
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability Paper • 2405.14129 • Published 7 days ago • 9
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis Paper • 2405.14224 • Published 7 days ago • 8
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published 7 days ago • 14
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Paper • 2405.12970 • Published 8 days ago • 20
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance Paper • 2405.12979 • Published 8 days ago • 7
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published 8 days ago • 23
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published 9 days ago • 25
Personalized Residuals for Concept-Driven Text-to-Image Generation Paper • 2405.12978 • Published 8 days ago • 8
Observational Scaling Laws and the Predictability of Language Model Performance Paper • 2405.10938 • Published 12 days ago • 10
Layer-Condensed KV Cache for Efficient Inference of Large Language Models Paper • 2405.10637 • Published 13 days ago • 16
INDUS: Effective and Efficient Language Models for Scientific Applications Paper • 2405.10725 • Published 13 days ago • 20
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published 11 days ago • 49
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching Paper • 2405.11252 • Published 12 days ago • 11
Towards Modular LLMs by Building and Reusing a Library of LoRAs Paper • 2405.11157 • Published 12 days ago • 23
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization Paper • 2405.11582 • Published 10 days ago • 10