nosuyo (Zhijie Yang)

liked 2 models 17 days ago

Dream-org/Dream-v0-Base-7B

Feature Extraction • Updated 8 days ago • 4.12k • 24

Dream-org/Dream-v0-Instruct-7B

Feature Extraction • Updated 8 days ago • 18.6k • 76

liked 2 models 19 days ago

ltg/deberta-xxlarge-fixed

Text Generation • Updated Mar 14 • 247 • 13

Rostlab/prot_bert

Fill-Mask • Updated Nov 16, 2023 • 275k • 111

upvoted a collection 23 days ago

Deepseek Papers

Collection

Deepseek papers collection • 19 items • Updated 22 days ago • 191

upvoted a collection 25 days ago

Qwen2.5-VL

Collection

Vision-language model series based on Qwen2.5 • 11 items • Updated 26 days ago • 448

upvoted a paper 27 days ago

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published Mar 16 • 34

reacted to Kseniase's post with 👀 27 days ago

Post

1979

9 Multimodal Chain-of-Thought methods

How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer.

Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source:

1. KAM-CoT -> KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning (2401.12863)
This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy

2. Multimodal Visualization-of-Thought (MVoT) -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542)
Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality

3. Compositional CoT (CCoT) -> Compositional Chain-of-Thought Prompting for Large Multimodal Models (2311.17076)
Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks

4. URSA -> URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)
Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification

5. MM-Verify -> MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification (2502.13383)
Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning

6. Duty-Distinct CoT (DDCoT) -> DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models (2310.16436)
Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process

7. Multimodal-CoT from Amazon Web Services -> Multimodal Chain-of-Thought Reasoning in Language Models (2302.00923)
A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs

8. Graph-of-Thought (GoT) -> Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models (2305.16582)
This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks

More in the comments👇