ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning Paper • 2406.19741 • Published 5 days ago • 46
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published 1 day ago • 60
Direct Preference Knowledge Distillation for Large Language Models Paper • 2406.19774 • Published 5 days ago • 15
GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality Paper • 2406.18462 • Published 6 days ago • 8
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published 5 days ago • 49
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published 4 days ago • 69
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation Paper • 2406.19215 • Published 6 days ago • 26
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding Paper • 2406.19263 • Published 5 days ago • 9
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published 12 days ago • 12
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published 6 days ago • 36
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published 5 days ago • 49
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Paper • 2406.17770 • Published 7 days ago • 18
LongIns: A Challenging Long-context Instruction-based Exam for LLMs Paper • 2406.17588 • Published 8 days ago • 18
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Paper • 2406.18495 • Published 6 days ago • 11
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models Paper • 2406.17294 • Published 8 days ago • 8
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records Paper • 2406.16341 • Published 9 days ago • 11
A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published 7 days ago • 12
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published 7 days ago • 45
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Paper • 2406.18521 • Published 6 days ago • 25
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA Paper • 2406.17419 • Published 8 days ago • 13
DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents Paper • 2406.13144 • Published 14 days ago • 9
Unlocking Continual Learning Abilities in Language Models Paper • 2406.17245 • Published 8 days ago • 27
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 8 days ago • 71
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Paper • 2406.14562 • Published 12 days ago • 26
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published 11 days ago • 14
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models Paper • 2406.16714 • Published 9 days ago • 10
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs Paper • 2406.15927 • Published 10 days ago • 13
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published 10 days ago • 41
Evaluating D-MERIT of Partial-annotation on Information Retrieval Paper • 2406.16048 • Published 10 days ago • 34
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published 8 days ago • 53
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report Paper • 2406.11403 • Published 16 days ago • 4
RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation Paper • 2406.14764 • Published 12 days ago • 4
Towards Retrieval Augmented Generation over Large Video Libraries Paper • 2406.14938 • Published 12 days ago • 18
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution Paper • 2406.13457 • Published 14 days ago • 12
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published 11 days ago • 55
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Paper • 2406.04338 • Published 26 days ago • 32
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published 20 days ago • 38
GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks Paper • 2406.12925 • Published 19 days ago • 17
Improving Visual Commonsense in Language Models via Multiple Image Generation Paper • 2406.13621 • Published 14 days ago • 13
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Paper • 2406.11896 • Published 18 days ago • 17
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published 18 days ago • 65
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials Paper • 2406.14347 • Published 13 days ago • 97
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Paper • 2406.09416 • Published 19 days ago • 28
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Paper • 2406.09170 • Published 20 days ago • 23
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published 21 days ago • 35
Designing a Dashboard for Transparency and Control of Conversational AI Paper • 2406.07882 • Published 21 days ago • 9
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published 18 days ago • 21
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published 20 days ago • 28
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack Paper • 2406.10149 • Published 18 days ago • 47
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation Paper • 2406.09961 • Published 19 days ago • 54
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Paper • 2406.08973 • Published 20 days ago • 85
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Paper • 2406.08451 • Published 20 days ago • 23
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation Paper • 2406.10996 • Published 16 days ago • 31
mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published 15 days ago • 36