Submitted by akhaliq 15 A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions · 6 authors 1
Submitted by akhaliq 12 Holodeck: Language Guided Generation of 3D Embodied AI Environments · 14 authors 2
Submitted by akhaliq 11 SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance · 3 authors 1
Submitted by akhaliq 11 Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention · 5 authors 1
Submitted by akhaliq 9 FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection · 6 authors 2
Submitted by akhaliq 9 ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks · 11 authors 2
Submitted by akhaliq 8 LIME: Localized Image Editing via Attention Regularization in Diffusion Models · 5 authors 1
Submitted by akhaliq 7 Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent · 7 authors 2
Submitted by akhaliq 6 UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation · 10 authors 1
Submitted by akhaliq 6 VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation · 8 authors 1
Submitted by akhaliq 4 Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking · 12 authors 1