Update README.md
Browse files
README.md
CHANGED
@@ -3,12 +3,13 @@ license: apache-2.0
|
|
3 |
datasets:
|
4 |
- mikewang/PVD-160K
|
5 |
---
|
|
|
6 |
<h1 align="center"> Text-Based Reasoning About Vector Graphics </h1>
|
7 |
|
8 |
<p align="center">
|
9 |
-
<a href="https://mikewangwzhl.github.io/VDLM
|
10 |
•
|
11 |
-
<a href="">📃 Paper</a>
|
12 |
•
|
13 |
<a href="https://huggingface.co/datasets/mikewang/PVD-160K" >🤗 Data (PVD-160k)</a>
|
14 |
•
|
@@ -18,6 +19,11 @@ datasets:
|
|
18 |
|
19 |
</p>
|
20 |
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
![Overview](https://github.com/MikeWangWZHL/VDLM/blob/main/figures/overview.png?raw=true)
|
|
|
3 |
datasets:
|
4 |
- mikewang/PVD-160K
|
5 |
---
|
6 |
+
|
7 |
<h1 align="center"> Text-Based Reasoning About Vector Graphics </h1>
|
8 |
|
9 |
<p align="center">
|
10 |
+
<a href="https://mikewangwzhl.github.io/VDLM">🌐 Homepage</a>
|
11 |
•
|
12 |
+
<a href="">📃 Paper (Coming Soon)</a>
|
13 |
•
|
14 |
<a href="https://huggingface.co/datasets/mikewang/PVD-160K" >🤗 Data (PVD-160k)</a>
|
15 |
•
|
|
|
19 |
|
20 |
</p>
|
21 |
|
22 |
+
|
23 |
+
We observe that current *large multimodal models (LMMs)* still struggle with seemingly straightforward reasoning tasks that require precise perception of low-level visual details, such as identifying spatial relations or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics—images composed purely of 2D objects and shapes.
|
24 |
+
|
25 |
+
![Teaser](https://github.com/MikeWangWZHL/VDLM/blob/main/figures/teaser.png?raw=true)
|
26 |
+
|
27 |
+
To solve this challenge, we propose **Visually Descriptive Language Model (VDLM)**, a text-based visual reasoning framework for vector graphics. VDLM operates on text-based visual descriptions—specifically, SVG representations and learned Primal Visual Descriptions (PVD), enabling zero-shot reasoning with an off-the-shelf LLM. We demonstrate that VDLM outperforms state-of-the-art large multimodal models, such as GPT-4V, across various multimodal reasoning tasks involving vector graphics. See our [paper (coming soon)]() for more details.
|
28 |
|
29 |
![Overview](https://github.com/MikeWangWZHL/VDLM/blob/main/figures/overview.png?raw=true)
|