layoutreader / README.md
hantian's picture
Update README.md
ba3b3cb verified
metadata
license: mit

LayoutReader

TODO:

  1. upload models to huggingface
  2. explain why this repo
  3. explain the new dataset
  4. build docker image

Helper

Build Dataset

python tools.py cache-dataset-spans --help

Train

bash train.sh

Eval

python eval.py --help

Spans-Level Results

One bbox contains multiple tokens. Usually, parse pdf file to get bbox. Training data is generated by tools.py.

only use the first part of test file

Method shuf BLEU Idx BLEU Token
Heuristic Method no 44.4 70.7
LayoutReader (layout only) no 95.3 97.8
LayoutReader (layout only) yes 95.0 97.6

Tokens-Level Results

One bbox contains only one token.

New eval script

only use the first part of test file

Method shuf BLEU Idx BLEU Token
Heuristic Method no 78.3 79.4
LayoutReader (layout only) no 98.0 98.2
LayoutReader (layout only) yes 97.8 98.0
LayoutReader (public model) no 98.0 98.3

Old eval script (from original paper)

  • Evaluation results of the LayoutReader on the reading order detection task, where the source-side of training/testing data is in the left-to-right and top-to-bottom order
Method Encoder BLEU ARD
Heuristic Method - 0.6972 8.46
LayoutReader (layout only) LayoutLM (layout only) 0.9732 2.31
LayoutReader LayoutLM 0.9819 1.75
  • Input order study with left-to-right and top-to-bottom inputs in evaluation, where r is the proportion of shuffled samples in training.
Method BLEU BLEU BLEU ARD ARD ARD
r=100% r=50% r=0% r=100% r=50% r=0%
LayoutReader (layout only) 0.9701 0.9729 0.9732 2.85 2.61 2.31
LayoutReader 0.9765 0.9788 0.9819 2.50 2.24 1.75
  • Input order study with token-shuffled inputs in evaluation, where r is the proportion of shuffled samples in training.
Method BLEU BLEU BLEU ARD ARD ARD
r=100% r=50% r=0% r=100% r=50% r=0%
LayoutReader (layout only) 0.9718 0.9714 0.1331 2.72 2.82 105.40
LayoutReader 0.9772 0.9770 0.1783 2.48 2.46 72.94