DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories Paper • 2405.19856 • Published 1 day ago • 1
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Paper • 2405.20222 • Published 1 day ago • 5
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable Paper • 2405.19888 • Published 1 day ago • 1
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Paper • 2405.20335 • Published 1 day ago • 8
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation Paper • 2405.20289 • Published 1 day ago • 4
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation Paper • 2405.18503 • Published 3 days ago • 4
Offline Regularised Reinforcement Learning for Large Language Models Alignment Paper • 2405.19107 • Published 2 days ago • 7
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Paper • 2405.19325 • Published 2 days ago • 9
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Paper • 2405.19320 • Published 2 days ago • 6
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Paper • 2405.19332 • Published 2 days ago • 8
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities Paper • 2405.18669 • Published 3 days ago • 9
LLMs achieve adult human performance on higher-order theory of mind tasks Paper • 2405.18870 • Published 3 days ago • 12
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published 3 days ago • 13
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published 2 days ago • 34
Real-World Image Variation by Aligning Diffusion Inversion Chain Paper • 2305.18729 • Published May 30, 2023 • 4
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter Paper • 2312.00330 • Published Dec 1, 2023 • 10
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance Paper • 2306.00943 • Published Jun 1, 2023 • 5
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors Paper • 2310.12190 • Published Oct 18, 2023 • 9
Lost in the Middle: How Language Models Use Long Contexts Paper • 2307.03172 • Published Jul 6, 2023 • 32
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Paper • 2310.11441 • Published Oct 17, 2023 • 25
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22 • 17
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published 3 days ago • 12
LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models Paper • 2405.18377 • Published 3 days ago • 11
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published 4 days ago • 15
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections Paper • 2405.17991 • Published 4 days ago • 8
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published 4 days ago • 44
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models Paper • 2405.16537 • Published 6 days ago • 14
Part123: Part-aware 3D Reconstruction from a Single-view Image Paper • 2405.16888 • Published 5 days ago • 10
Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models Paper • 2405.16759 • Published 5 days ago • 7
Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning Paper • 2405.17258 • Published 4 days ago • 11
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published 4 days ago • 7
Fast Transformer Decoding: One Write-Head is All You Need Paper • 1911.02150 • Published Nov 6, 2019 • 6
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published 8 days ago • 30
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published 7 days ago • 41
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published 8 days ago • 19
iVideoGPT: Interactive VideoGPTs are Scalable World Models Paper • 2405.15223 • Published 8 days ago • 11
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition Paper • 2405.15216 • Published 8 days ago • 11
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published 8 days ago • 21
Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining Paper • 2405.14908 • Published 9 days ago • 10
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published 7 days ago • 11
HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting Paper • 2405.15125 • Published 8 days ago • 4
SimPO: Simple Preference Optimization with a Reference-Free Reward Paper • 2405.14734 • Published 8 days ago • 7
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published 9 days ago • 19
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 40
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published 9 days ago • 14
Improved Distribution Matching Distillation for Fast Image Synthesis Paper • 2405.14867 • Published 8 days ago • 10