metadata

license: mit

LayoutReader

TODO:

Helper

python tools.py cache-dataset-spans --help

bash train.sh

python eval.py --help

One bbox contains multiple tokens. Usually, parse pdf file to get bbox. Training data is generated by tools.py.

only use the first part of test file

Method	shuf	BLEU Idx	BLEU Token
Heuristic Method	no	44.4	70.7
LayoutReader (layout only)	no	95.3	97.8
LayoutReader (layout only)	yes	95.0	97.6

One bbox contains only one token.

only use the first part of test file

Method	shuf	BLEU Idx	BLEU Token
Heuristic Method	no	78.3	79.4
LayoutReader (layout only)	no	98.0	98.2
LayoutReader (layout only)	yes	97.8	98.0
LayoutReader (public model)	no	98.0	98.3

Evaluation results of the LayoutReader on the reading order detection task, where the source-side of training/testing data is in the left-to-right and top-to-bottom order

Method	Encoder	BLEU	ARD
Heuristic Method	-	0.6972	8.46
LayoutReader (layout only)	LayoutLM (layout only)	0.9732	2.31
LayoutReader	LayoutLM	0.9819	1.75

Input order study with left-to-right and top-to-bottom inputs in evaluation, where r is the proportion of shuffled samples in training.

Method	BLEU	BLEU	BLEU	ARD	ARD	ARD
	r=100%	r=50%	r=0%	r=100%	r=50%	r=0%
LayoutReader (layout only)	0.9701	0.9729	0.9732	2.85	2.61	2.31
LayoutReader	0.9765	0.9788	0.9819	2.50	2.24	1.75

Input order study with token-shuffled inputs in evaluation, where r is the proportion of shuffled samples in training.

Method	BLEU	BLEU	BLEU	ARD	ARD	ARD
	r=100%	r=50%	r=0%	r=100%	r=50%	r=0%
LayoutReader (layout only)	0.9718	0.9714	0.1331	2.72	2.82	105.40
LayoutReader	0.9772	0.9770	0.1783	2.48	2.46	72.94