Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models Paper • 2405.20541 • Published 4 days ago • 4
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published 3 days ago • 16
4Diffusion: Multi-view Video Diffusion Model for 4D Generation Paper • 2405.20674 • Published 3 days ago • 6
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Paper • 2405.21048 • Published 3 days ago • 7
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis Paper • 2405.21075 • Published 3 days ago • 9
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models Paper • 2405.09062 • Published 19 days ago • 7
Dynamic data sampler for cross-language transfer learning in large language models Paper • 2405.10626 • Published 17 days ago • 4
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published 11 days ago • 11
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published 12 days ago • 21
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark Paper • 2405.19707 • Published 4 days ago • 1
DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories Paper • 2405.19856 • Published 4 days ago • 4
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable Paper • 2405.19888 • Published 4 days ago • 2
MotionLLM: Understanding Human Behaviors from Human Motions and Videos Paper • 2405.20340 • Published 4 days ago • 14
PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting Paper • 2405.19957 • Published 4 days ago • 4
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Paper • 2405.20335 • Published 4 days ago • 13
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Paper • 2405.20222 • Published 4 days ago • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation Paper • 2405.20289 • Published 4 days ago • 6
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper • 2405.19893 • Published 4 days ago • 17
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published 4 days ago • 21
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Paper • 2405.18991 • Published 5 days ago • 11
Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication Paper • 2405.18515 • Published 6 days ago • 3
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation Paper • 2405.18503 • Published 6 days ago • 5
Offline Regularised Reinforcement Learning for Large Language Models Alignment Paper • 2405.19107 • Published 5 days ago • 8
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Paper • 2405.19320 • Published 5 days ago • 6
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities Paper • 2405.18669 • Published 6 days ago • 9
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Paper • 2405.19332 • Published 5 days ago • 10
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Paper • 2405.19325 • Published 5 days ago • 10
LLMs achieve adult human performance on higher-order theory of mind tasks Paper • 2405.18870 • Published 5 days ago • 13
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published 5 days ago • 16
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published 5 days ago • 40
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting Paper • 2405.18424 • Published 6 days ago • 7
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published 6 days ago • 13
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published 6 days ago • 16
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections Paper • 2405.17991 • Published 6 days ago • 9
LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models Paper • 2405.18377 • Published 6 days ago • 12
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters Paper • 2405.16287 • Published 9 days ago • 9
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models Paper • 2405.16537 • Published 8 days ago • 15
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published 7 days ago • 7
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer Paper • 2405.17405 • Published 7 days ago • 13
Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models Paper • 2405.16759 • Published 7 days ago • 7
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models Paper • 2405.17428 • Published 7 days ago • 13
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published 7 days ago • 47
Part123: Part-aware 3D Reconstruction from a Single-view Image Paper • 2405.16888 • Published 7 days ago • 10
Looking Backward: Streaming Video-to-Video Translation with Feature Banks Paper • 2405.15757 • Published 10 days ago • 12
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels Paper • 2405.16822 • Published 7 days ago • 11
Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning Paper • 2405.17258 • Published 7 days ago • 11
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published 10 days ago • 41
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published 10 days ago • 20