HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting Paper • 2405.15125 • Published 4 days ago • 3
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition Paper • 2405.15216 • Published 3 days ago • 4
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published 3 days ago • 5
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct Paper • 2405.14906 • Published 5 days ago • 5
Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining Paper • 2405.14908 • Published 4 days ago • 6
iVideoGPT: Interactive VideoGPTs are Scalable World Models Paper • 2405.15223 • Published 3 days ago • 7
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published 3 days ago • 8
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner Paper • 2405.14979 • Published 4 days ago • 8
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published 4 days ago • 11
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published 4 days ago • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published 3 days ago • 26
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published 3 days ago • 28
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models Paper • 2312.02969 • Published Dec 5, 2023 • 12
You Only Cache Once: Decoder-Decoder Architectures for Language Models Paper • 2405.05254 • Published 19 days ago • 7
NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections Paper • 2405.14871 • Published 4 days ago • 5
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras Paper • 2405.14866 • Published 4 days ago • 5
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling Paper • 2405.14847 • Published 4 days ago • 6
CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers Paper • 2405.13195 • Published 6 days ago • 6
Semantica: An Adaptable Image-Conditioned Diffusion Model Paper • 2405.14857 • Published 4 days ago • 5
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published 4 days ago • 22
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published 5 days ago • 14
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability Paper • 2405.14129 • Published 5 days ago • 8
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis Paper • 2405.14224 • Published 4 days ago • 6
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published 4 days ago • 13
Improved Distribution Matching Distillation for Fast Image Synthesis Paper • 2405.14867 • Published 4 days ago • 9
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published 5 days ago • 19
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published 4 days ago • 9
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance Paper • 2405.14677 • Published 4 days ago • 8
Images that Sound: Composing Images and Sounds on a Single Canvas Paper • 2405.12221 • Published 7 days ago • 1
Personalized Residuals for Concept-Driven Text-to-Image Generation Paper • 2405.12978 • Published 6 days ago • 8
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance Paper • 2405.12979 • Published 6 days ago • 7
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Paper • 2405.12970 • Published 6 days ago • 20
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published 7 days ago • 25
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published 6 days ago • 22
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization Paper • 2405.11582 • Published 8 days ago • 10
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching Paper • 2405.11252 • Published 9 days ago • 11
Towards Modular LLMs by Building and Reusing a Library of LoRAs Paper • 2405.11157 • Published 10 days ago • 22
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published 8 days ago • 48
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published 8 days ago • 31
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published 7 days ago • 21
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published 7 days ago • 37
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion Paper • 2405.09874 • Published 11 days ago • 14
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction Paper • 2405.10315 • Published 11 days ago • 9
CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published 11 days ago • 37
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published 11 days ago • 24
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published 11 days ago • 22
Chameleon: Mixed-Modal Early-Fusion Foundation Models Paper • 2405.09818 • Published 11 days ago • 91
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation Paper • 2405.09546 • Published 12 days ago • 9
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Paper • 2405.09215 • Published 12 days ago • 14
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published 12 days ago • 22
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding Paper • 2405.08344 • Published 13 days ago • 10