Submitted by akhaliq 36 CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching · 8 authors 4
Submitted by akhaliq 28 MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens · 7 authors 3
Submitted by akhaliq 27 AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent · 11 authors 3
Submitted by akhaliq 26 LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models · 10 authors 1
Submitted by akhaliq 18 CodeEditorBench: Evaluating Code Editing Capability of Large Language Models · 16 authors 1
Submitted by akhaliq 11 Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? · 8 authors
Submitted by akhaliq 10 RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis · 11 authors