Post
760
๐๐๐Introducing Insight-V! An early attempt towards o1-like multi-modal reasoning.
We offer a structured long-chain visual reasoning data generation pipeline and a multi-agent system to unleash the reasoning potential of MLLMs.
๐ Paper: https://arxiv.org/abs/2411.14432
๐ ๏ธ Github: https://github.com/dongyh20/Insight-V
๐ผ Model Weight: THUdyh/insight-v-673f5e1dd8ab5f2d8d332035
We offer a structured long-chain visual reasoning data generation pipeline and a multi-agent system to unleash the reasoning potential of MLLMs.
๐ Paper: https://arxiv.org/abs/2411.14432
๐ ๏ธ Github: https://github.com/dongyh20/Insight-V
๐ผ Model Weight: THUdyh/insight-v-673f5e1dd8ab5f2d8d332035