Spaces:

Tzktz
/

Dit-document-layout-analysis

Sleeping

App Files Files Community

Dit-document-layout-analysis / unilm /layoutreader /README.md

Tzktz

Upload 7664 files

6fc683c verified about 1 year ago

preview code

raw

history blame

7.81 kB

	# LayoutReader

	LayoutReader captures the text and layout information for reading order prediction using the seq2seq model. It significantly improves both open-source and commercial OCR engines in ordering text lines in their results in our experiments.


	Our paper "[LayoutReader: Pre-training of Text and Layout for Reading Order Detection](https://arxiv.org/pdf/2108.11591.pdf)" has been accepted by EMNLP 2021.

	ReadingBank is a benchmark dataset for reading order detection built with weak supervision from WORD documents, which contains 500K document images with a wide range of document types as well as the corresponding reading order information. For more details, please refer to [ReadingBank](https://aka.ms/readingbank).

	## Installation
	~~~
	conda create -n LayoutReader python=3.7
	conda activate LayoutReader
	conda install pytorch==1.7.1 -c pytorch
	pip install nltk
	python -c "import nltk; nltk.download('punkt')"
	git clone https://github.com/NVIDIA/apex.git && cd apex && python setup.py install --cuda_ext --cpp_ext
	pip install transformers==2.10.0
	git clone https://github.com/microsoft/unilm.git
	cd unilm/layoutreader
	pip install -e .
	~~~

	## Run
	1. Download the [pre-processed data](https://layoutlm.blob.core.windows.net/readingbank/dataset/ReadingBank.zip?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D
	). For more details of the dataset, please refer to [ReadingBank](https://aka.ms/readingbank).
	2. (Optional) Download our [pre-trained model](https://layoutlm.blob.core.windows.net/readingbank/model/layoutreader-base-readingbank.zip?sv=2022-11-02&ss=b&srt=o&sp=r&se=2033-06-08T16:48:15Z&st=2023-06-08T08:48:15Z&spr=https&sig=a9VXrihTzbWyVfaIDlIT1Z0FoR1073VB0RLQUMuudD4%3D
	) and evaluate it refer to step 4.
	3. Training
	~~~
	export CUDA_VISIBLE_DEVICE=0,1,2,3
	export OMP_NUM_THREADS=4
	export MKL_NUM_THREADS=4

	python -m torch.distributed.launch --nproc_per_node=4 run_seq2seq.py \
	--model_type layoutlm \
	--model_name_or_path layoutlm-base-uncased \
	--train_folder /path/to/ReadingBank/train \
	--output_dir /path/to/output/LayoutReader/layoutlm \
	--do_lower_case \
	--fp16 \
	--fp16_opt_level O2 \
	--max_source_seq_length 513 \
	--max_target_seq_length 511 \
	--per_gpu_train_batch_size 2 \
	--gradient_accumulation_steps 1 \
	--learning_rate 7e-5 \
	--num_warmup_steps 500 \
	--num_training_steps 75000 \
	--cache_dir /path/to/output/LayoutReader/cache \
	--label_smoothing 0.1 \
	--save_steps 5000 \
	--cached_train_features_file /path/to/ReadingBank/features_train.pt
	~~~
	4. Decoding
	~~~
	export CUDA_VISIBLE_DEVICES=0
	export OMP_NUM_THREADS=4
	export MKL_NUM_THREADS=4

	python decode_seq2seq.py --fp16 \
	--model_type layoutlm \
	--tokenizer_name bert-base-uncased \
	--input_folder /path/to/ReadingBank/test \
	--cached_feature_file /path/to/ReadingBank/features_test.pt \
	--output_file /path/to/output/LayoutReader/layoutlm/output.txt \
	--split test \
	--do_lower_case \
	--model_path /path/to/output/LayoutReader/layoutlm/ckpt-75000 \
	--cache_dir /path/to/output/LayoutReader/cache \
	--max_seq_length 1024 \
	--max_tgt_length 511 \
	--batch_size 32 \
	--beam_size 1 \
	--length_penalty 0 \
	--forbid_duplicate_ngrams \
	--mode s2s \
	--forbid_ignore_word "."
	~~~

	## Results
	Our released [pre-trained model](https://layoutlm.blob.core.windows.net/readingbank/dataset/layoutreader-base-readingbank.zip) achieves 98.2% Average Page-level BLEU score. Detailed results are reported as follow:

	* Evaluation results of the LayoutReader on the reading order detection task, where the source-side of training/testing data is in the left-to-right and top-to-bottom order

	\| Method \| Encoder \| Avg. Page-level BLEU ↑ \| ARD ↓ \|
	\| -------------------------- \| ---------------------- \| ---------------------- \| ----- \|
	\| Heuristic Method \| - \| 0.6972 \| 8.46 \|
	\| LayoutReader (text only) \| BERT \| 0.8510 \| 12.08 \|
	\| LayoutReader (text only) \| UniLM \| 0.8765 \| 10.65 \|
	\| LayoutReader (layout only) \| LayoutLM (layout only) \| 0.9732 \| 2.31 \|
	\| LayoutReader \| LayoutLM \| 0.9819 \| 1.75 \|

	* Input order study with left-to-right and top-to-bottom inputs in evaluation, where r is the proportion of
	shuffled samples in training.

	\| Method \| Avg. Page-level BLEU ↑ \| Avg. Page-level BLEU ↑ \| Avg. Page-level BLEU ↑ \| ARD ↓ \| ARD ↓ \| ARD ↓ \|
	\|---------------------------------\|------------------------\|------------------------\|------------------------\|--------\|-------\|-------\|
	\| \| r=100% \| r=50% \| r=0% \| r=100% \| r=50% \| r=0% \|
	\| LayoutReader (text only, BERT) \| 0.3355 \| 0.8397 \| 0.8510 \| 77.97 \| 15.62 \| 12.08 \|
	\| LayoutReader (text only, UniLM) \| 0.3440 \| 0.8588 \| 0.8765 \| 78.67 \| 13.65 \| 10.65 \|
	\| LayoutReader (layout only) \| 0.9701 \| 0.9729 \| 0.9732 \| 2.85 \| 2.61 \| 2.31 \|
	\| LayoutReader \| 0.9765 \| 0.9788 \| 0.9819 \| 2.50 \| 2.24 \| 1.75 \|

	* Input order study with token-shuffled inputs in evaluation, where r is the proportion of shuffled samples in training.

	\| Method \| Avg. Page-level BLEU ↑ \| Avg. Page-level BLEU ↑ \| Avg. Page-level BLEU ↑ \| ARD ↓ \| ARD ↓ \| ARD ↓ \|
	\|---------------------------------\|------------------------\|------------------------\|------------------------\|--------\|-------\|--------\|
	\| \| r=100% \| r=50% \| r=0% \| r=100% \| r=50% \| r=0% \|
	\| LayoutReader (text only, BERT) \| 0.3085 \| 0.2730 \| 0.1711 \| 78.69 \| 85.44 \| 67.96 \|
	\| LayoutReader (text only, UniLM) \| 0.3119 \| 0.2855 \| 0.1728 \| 80.00 \| 85.60 \| 71.13 \|
	\| LayoutReader (layout only) \| 0.9718 \| 0.9714 \| 0.1331 \| 2.72 \| 2.82 \| 105.40 \|
	\| LayoutReader \| 0.9772 \| 0.9770 \| 0.1783 \| 2.48 \| 2.46 \| 72.94 \|

	## Citation

	If you find LayoutReader helpful, please cite us:
	```
	@misc{wang2021layoutreader,
	title={LayoutReader: Pre-training of Text and Layout for Reading Order Detection},
	author={Zilong Wang and Yiheng Xu and Lei Cui and Jingbo Shang and Furu Wei},
	year={2021},
	eprint={2108.11591},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```


	## License

	This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
	Portions of the source code are based on the [transformers](https://github.com/huggingface/transformers) and [s2s-ft](../s2s-ft) projects.
	[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)

	## Contact

	For help or issues using LayoutReader, please submit a GitHub issue.

	For other communications related to LayoutLM, please contact Lei Cui (`lecu@microsoft.com`), Furu Wei (`fuwei@microsoft.com`).