What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study Paper • 2410.00545 • Published 10 days ago • 5
DressRecon: Freeform 4D Human Reconstruction from Monocular Video Paper • 2409.20563 • Published 11 days ago • 7
Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models Paper • 2410.00231 • Published 11 days ago • 6
Visual Context Window Extension: A New Perspective for Long Video Understanding Paper • 2409.20018 • Published 11 days ago • 7
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer Paper • 2410.00086 • Published 11 days ago • 10
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration Paper • 2410.00418 • Published 11 days ago • 9
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs Paper • 2410.00337 • Published 11 days ago • 10
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation Paper • 2410.00890 • Published 10 days ago • 16
Illustrious: an Open Advanced Illustration Model Paper • 2409.19946 • Published 12 days ago • 10
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos Paper • 2409.19603 • Published 12 days ago • 17
Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect Paper • 2409.17912 • Published 15 days ago • 20
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published 10 days ago • 28
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published 12 days ago • 51
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14 • 46
LML: Language Model Learning a Dataset for Data-Augmented Prediction Paper • 2409.18957 • Published 14 days ago • 9
A Survey on the Honesty of Large Language Models Paper • 2409.18786 • Published 14 days ago • 28
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows Paper • 2409.17433 • Published 16 days ago • 8
MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making Paper • 2409.16686 • Published 16 days ago • 7
MinerU: An Open-Source Solution for Precise Document Content Extraction Paper • 2409.18839 • Published 14 days ago • 24
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult Paper • 2409.17545 • Published 16 days ago • 16
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published 14 days ago • 23
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published 16 days ago • 25
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image Paper • 2409.17280 • Published 16 days ago • 8
Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction Paper • 2409.18121 • Published 15 days ago • 7
Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study Paper • 2409.17580 • Published 16 days ago • 6
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling Paper • 2409.14683 • Published 19 days ago • 8
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends Paper • 2409.14195 • Published 20 days ago • 11
Instruction Following without Instruction Tuning Paper • 2409.14254 • Published 20 days ago • 26
Pixel-Space Post-Training of Latent Diffusion Models Paper • 2409.17565 • Published 16 days ago • 19
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published 16 days ago • 23
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published 15 days ago • 29
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 15 days ago • 35
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published 15 days ago • 33
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 16 days ago • 46
Platypus: A Generalized Specialist Model for Reading Text in Various Forms Paper • 2408.14805 • Published Aug 27 • 12
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation Paper • 2408.14819 • Published Aug 27 • 19
Text2SQL is Not Enough: Unifying AI and Databases with TAG Paper • 2408.14717 • Published Aug 27 • 23
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published Aug 27 • 36
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper • 2408.15239 • Published Aug 27 • 27
GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars Paper • 2408.13674 • Published Aug 24 • 17
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data Paper • 2408.10119 • Published Aug 19 • 15
NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices Paper • 2408.10161 • Published Aug 19 • 11
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model Paper • 2408.10198 • Published Aug 19 • 32
Better Alignment with Instruction Back-and-Forth Translation Paper • 2408.04614 • Published Aug 8 • 14
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics Paper • 2408.04631 • Published Aug 8 • 8
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection Paper • 2408.04284 • Published Aug 8 • 21
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 154
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches Paper • 2408.04567 • Published Aug 8 • 23
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models Paper • 2408.04594 • Published Aug 8 • 14
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Paper • 2408.03361 • Published Aug 6 • 85
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation Paper • 2408.01708 • Published Aug 3 • 3
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts Paper • 2408.03209 • Published Aug 6 • 21