15 A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions · 6 authors 1
10 SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance · 3 authors 1
9 FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection · 6 authors 2
9 ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks · 11 authors 2
6 VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation · 8 authors 1
3 Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking · 12 authors 1