UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Paper • 2407.05282 • Published 3 days ago • 3 • 1
Compositional Video Generation as Flow Equalization Paper • 2407.06182 • Published 29 days ago • 2 • 1
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams Paper • 2406.08085 • Published 28 days ago • 7 • 1
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild Paper • 2407.04172 • Published 5 days ago • 14 • 5
CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images Paper • 2407.03923 • Published 5 days ago • 4 • 1
HEMM: Holistic Evaluation of Multimodal Foundation Models Paper • 2407.03418 • Published 6 days ago • 6 • 1
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Paper • 2407.04051 • Published 5 days ago • 29 • 1
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs Paper • 2407.03963 • Published 5 days ago • 6 • 1
On scalable oversight with weak LLMs judging strong LLMs Paper • 2407.04622 • Published 4 days ago • 9 • 1
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published 4 days ago • 13 • 2
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published 7 days ago • 13 • 3
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix Paper • 2407.00367 • Published 11 days ago • 9 • 1
AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models Paper • 2406.10900 • Published 24 days ago • 11 • 4
MotionBooth: Motion-Aware Customized Text-to-Video Generation Paper • 2406.17758 • Published 14 days ago • 17 • 1
Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models Paper • 2406.14599 • Published 19 days ago • 16 • 2
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning Paper • 2406.14130 • Published 20 days ago • 10 • 2
JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning Paper • 2406.12292 • Published 22 days ago • 4 • 2
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published 22 days ago • 28 • 10
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published 21 days ago • 27 • 2
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published 21 days ago • 14 • 2
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published 22 days ago • 54 • 3
Training-free Camera Control for Video Generation Paper • 2406.10126 • Published 25 days ago • 12 • 2
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Paper • 2406.08587 • Published 27 days ago • 15 • 4
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Paper • 2406.09170 • Published 26 days ago • 23 • 1
Interpreting the Weight Space of Customized Diffusion Models Paper • 2406.09413 • Published 26 days ago • 18 • 1
OpenVLA: An Open-Source Vision-Language-Action Model Paper • 2406.09246 • Published 26 days ago • 29 • 1
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published 26 days ago • 47 • 2
HelpSteer2: Open-source dataset for training top-performing reward models Paper • 2406.08673 • Published 27 days ago • 14 • 3
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs Paper • 2406.08657 • Published 27 days ago • 9 • 2
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published 27 days ago • 20 • 1