Commit
·
af18111
1
Parent(s):
42152ca
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,89 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
# [Unifying Vision, Text, and Layout for Universal Document Processing (CVPR 2023 Highlight)](https://arxiv.org/pdf/2212.02623)
|
6 |
+
[Zineng Tang](https://zinengtang.github.io/),
|
7 |
+
[Ziyi Yang](https://ziyi-yang.github.io/),
|
8 |
+
[Guoxin Wang](https://www.guoxwang.com/),
|
9 |
+
[Yuwei Fang](https://www.microsoft.com/en-us/research/people/yuwfan/),
|
10 |
+
[Yang Liu](https://nlp-yang.github.io/),
|
11 |
+
[Chenguang Zhu](https://cs.stanford.edu/people/cgzhu/),
|
12 |
+
[Michael Zeng](https://www.microsoft.com/en-us/research/people/nzeng/),
|
13 |
+
[Cha Zhang](https://www.microsoft.com/en-us/research/people/chazhang/),
|
14 |
+
[Mohit Bansal](https://www.cs.unc.edu/~mbansal/)
|
15 |
+
|
16 |
+
|
17 |
+
Open Source Checklist:
|
18 |
+
|
19 |
+
- [x] Release Model (Encoder + Text decoder)
|
20 |
+
- [x] Release Most Scripts
|
21 |
+
- [ ] Vision Decoder / Weights (Due to fake document generation ethical consideration, we plan to release this functionality as an Azure API)
|
22 |
+
- [x] Demo
|
23 |
+
|
24 |
+
## Introduction
|
25 |
+
|
26 |
+
UDOP unifies vision, text, and layout through vision-text-layout Transformer and unified generative pretraining tasks including
|
27 |
+
vision task, text task, layout task, and mixed task. We show the task prompts (left) and task targets (right) for all self-supervised objectives
|
28 |
+
(joint text-layout reconstruction, visual text recognition, layout modeling, and masked autoencoding) and two example supervised objectives
|
29 |
+
(question answering and layout analysis).
|
30 |
+
|
31 |
+
<p align="center">
|
32 |
+
<img align="middle" width="800" src="assets/udop.png"/>
|
33 |
+
</p>
|
34 |
+
|
35 |
+
## Install
|
36 |
+
### Setup `python` environment
|
37 |
+
```
|
38 |
+
conda create -n UDOP python=3.8 # You can also use other environment.
|
39 |
+
```
|
40 |
+
### Install other dependencies
|
41 |
+
```
|
42 |
+
pip install -r requirements.txt
|
43 |
+
```
|
44 |
+
|
45 |
+
## Run Scripts
|
46 |
+
|
47 |
+
Switch model type by:
|
48 |
+
|
49 |
+
--model_type "UdopDual"
|
50 |
+
|
51 |
+
--model_type "UdopUnimodel"
|
52 |
+
|
53 |
+
### Finetuninng on RVLCDIP
|
54 |
+
|
55 |
+
Download RVLCDIP first and change the path
|
56 |
+
For OCR, you might need to customize your code
|
57 |
+
```
|
58 |
+
bash scripts/finetune_rvlcdip.sh # Finetuning on RVLCDIP
|
59 |
+
```
|
60 |
+
|
61 |
+
### Finetuninng on DUE Benchmark
|
62 |
+
|
63 |
+
Download [Duebenchmark](https://github.com/due-benchmark/baselines) and follow its procedure to preprocess the data.
|
64 |
+
|
65 |
+
The training code adapted to our framework is hosted at benchmarker by running:
|
66 |
+
|
67 |
+
```
|
68 |
+
bash scripts/finetune_duebenchmark.sh # Finetuning on DUE Benchmark, Switch tasks by changing path to the dataset
|
69 |
+
```
|
70 |
+
|
71 |
+
Evaluation of the output generation can be evaluated by [Duebenchmark due_evaluator](https://github.com/due-benchmark/evaluator)
|
72 |
+
|
73 |
+
### Model Checkpoints
|
74 |
+
The model checkpoints are hosted here [Huggingface Hub](https://huggingface.co/ZinengTang/Udop)
|
75 |
+
|
76 |
+
## Citation
|
77 |
+
```
|
78 |
+
@article{tang2022unifying,
|
79 |
+
title={Unifying Vision, Text, and Layout for Universal Document Processing},
|
80 |
+
author={Tang, Zineng and Yang, Ziyi and Wang, Guoxin and Fang, Yuwei and Liu, Yang and Zhu, Chenguang and Zeng, Michael and Zhang, Cha and Bansal, Mohit},
|
81 |
+
journal={arXiv preprint arXiv:2212.02623},
|
82 |
+
year={2022}
|
83 |
+
}
|
84 |
+
```
|
85 |
+
|
86 |
+
## Contact
|
87 |
+
|
88 |
+
Zineng Tang (zn.tang.terran@gmail.com)
|
89 |
+
|