Submitted by akhaliq 44 Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text · 8 authors 3
Submitted by akhaliq 36 Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment · 4 authors 1
Submitted by akhaliq 30 Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding · 2 authors 5
Submitted by akhaliq 30 Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs · 6 authors 2
Submitted by akhaliq 27 CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark · 23 authors 2
Submitted by akhaliq 26 SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities · 9 authors 2
Submitted by akhaliq 22 Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers · 6 authors 2
Submitted by akhaliq 22 CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation · 17 authors 2
Submitted by akhaliq 21 DITTO: Diffusion Inference-Time T-Optimization for Music Generation · 4 authors 2
Submitted by akhaliq 17 EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models · 4 authors 2
Submitted by akhaliq 10 StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion · 7 authors 1
Submitted by akhaliq 10 OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics · 5 authors 2
Submitted by akhaliq 7 UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures · 4 authors 2
Submitted by akhaliq 6 Single-View 3D Human Digitalization with Large Reconstruction Models · 7 authors 1
Submitted by akhaliq 2 Fast Registration of Photorealistic Avatars for VR Facial Animation · 5 authors 1