Spaces:

mckabue
/

document-similarity-matching-using-visual-layout-features-archive

Build error

Charles Kabui commited on Feb 16, 2024

Commit

2d81b98

Add 'model/layout-model-training/' from commit 'b9fad076596272e427612d5e848da1ba8ea06b97'

git-subtree-dir: model/layout-model-training
git-subtree-mainline: b404f5c2f60d251e639f628f3e66efcdd1357b99
git-subtree-split: b9fad076596272e427612d5e848da1ba8ea06b97

Files changed (10) hide show

model/layout-model-training/.gitignore +133 -0
model/layout-model-training/README.md +38 -0
model/layout-model-training/configs/prima/fast_rcnn_R_50_FPN_3x.yaml +307 -0
model/layout-model-training/configs/prima/mask_rcnn_R_50_FPN_3x.yaml +307 -0
model/layout-model-training/requirements.txt +6 -0
model/layout-model-training/scripts/train_prima.sh +17 -0
model/layout-model-training/tools/convert_prima_to_coco.py +225 -0
model/layout-model-training/tools/train_net.py +229 -0
model/layout-model-training/utils/__init__.py +0 -0
model/layout-model-training/utils/cocosplit.py +112 -0

model/layout-model-training/.gitignore ADDED Viewed

	@@ -0,0 +1,133 @@

+# folrder
+data
+data/
+credential
+credential/
+model
+model/
+result
+result*/
+outputs/
+# Mac Finder Configurations
+.DS_Store
+# IDEA configurations
+.idea/
+# IPython checkpoints
+.ipynb_checkpoints/
+log
+# Visual Studio Code
+.vscode/
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+.python-version
+# celery beat schedule file
+celerybeat-schedule
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json

model/layout-model-training/README.md ADDED Viewed

	@@ -0,0 +1,38 @@

+# Scripts for training Layout Detection Models using Detectron2
+## Usage
+### Directory Structure
+- In `tools/`, we provide a series of handy scripts for converting data formats and training the models.
+- In `scripts/`, it lists specific command for running the code for processing the given dataset.
+- The `configs/` contains the configuration for different deep learning models, and is organized by datasets.
+### How to train the models?
+- Get the dataset and annotations -- if you are not sure, feel free to check [this tutorial](https://github.com/Layout-Parser/layout-parser/tree/main/examples/Customizing%20Layout%20Models%20with%20Label%20Studio%20Annotation).
+- Duplicate and modify the config files and training scripts
+    - For example, you might want to copy [`configs/prima/fast_rcnn_R_50_FPN_3x`](configs/prima/fast_rcnn_R_50_FPN_3x.yaml) to [`configs/your-dataset-name/fast_rcnn_R_50_FPN_3x`](configs/prima/fast_rcnn_R_50_FPN_3x.yaml), and you can create your own `scripts/train_<your-dataset-name>.sh` based on [`scripts/train_prima.sh`](scripts/train_prima.sh).
+    - You'll modify the `--dataset_name`, `--json_annotation_train`, `--image_path_train`, `--json_annotation_val`, `--image_path_val`, and `--config-file` args appropriately.
+- If you have a dataset with segmentation masks, you can try to train with the [`mask_rcnn model`](configs/prima/mask_rcnn_R_50_FPN_3x.yaml); otherwise you might want to start with the [`fast_rcnn model`](configs/prima/fast_rcnn_R_50_FPN_3x.yaml)
+    - If you see error `AttributeError: Cannot find field 'gt_masks' in the given Instances!` during training, this means you should not use
+## Supported Datasets
+- Prima Layout Analysis Dataset [`scripts/train_prima.sh`](https://github.com/Layout-Parser/layout-model-training/blob/master/scripts/train_prima.sh)
+    - You will need to download the dataset from the [official website](https://www.primaresearch.org/dataset/) and put it in the `data/prima` folder.
+    - As the original dataset is stored in the [PAGE format](https://www.primaresearch.org/tools/PAGEViewer), the script will use [`tools/convert_prima_to_coco.py`](https://github.com/Layout-Parser/layout-model-training/blob/master/tools/convert_prima_to_coco.py) to convert it to COCO format.
+    - The final dataset folder structure should look like:
+        ```bash
+        data/
+        └── prima/
+            ├── Images/
+            ├── XML/
+            ├── License.txt
+            └── annotations*.json
+        ```
+## Reference
+- **[cocosplit](https://github.com/akarazniewicz/cocosplit)**  A script that splits the coco annotations into train and test sets.
+- **[Detectron2](https://github.com/facebookresearch/detectron2)** Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms.

model/layout-model-training/configs/prima/fast_rcnn_R_50_FPN_3x.yaml ADDED Viewed

	@@ -0,0 +1,307 @@

+CUDNN_BENCHMARK: false
+DATALOADER:
+  ASPECT_RATIO_GROUPING: true
+  FILTER_EMPTY_ANNOTATIONS: true
+  NUM_WORKERS: 4
+  REPEAT_THRESHOLD: 0.0
+  SAMPLER_TRAIN: TrainingSampler
+DATASETS:
+  PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
+  PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
+  PROPOSAL_FILES_TEST: []
+  PROPOSAL_FILES_TRAIN: []
+  TEST: []
+  TRAIN: []
+GLOBAL:
+  HACK: 1.0
+INPUT:
+  CROP:
+    ENABLED: false
+    SIZE:
+    - 0.9
+    - 0.9
+    TYPE: relative_range
+  FORMAT: BGR
+  MASK_FORMAT: polygon
+  MAX_SIZE_TEST: 1333
+  MAX_SIZE_TRAIN: 1333
+  MIN_SIZE_TEST: 800
+  MIN_SIZE_TRAIN:
+  - 640
+  - 672
+  - 704
+  - 736
+  - 768
+  - 800
+  MIN_SIZE_TRAIN_SAMPLING: choice
+MODEL:
+  ANCHOR_GENERATOR:
+    ANGLES:
+    - - -90
+      - 0
+      - 90
+    ASPECT_RATIOS:
+    - - 0.5
+      - 1.0
+      - 2.0
+    NAME: DefaultAnchorGenerator
+    OFFSET: 0.0
+    SIZES:
+    - - 32
+    - - 64
+    - - 128
+    - - 256
+    - - 512
+  BACKBONE:
+    FREEZE_AT: 2
+    NAME: build_resnet_fpn_backbone
+  DEVICE: cuda
+  FPN:
+    FUSE_TYPE: sum
+    IN_FEATURES:
+    - res2
+    - res3
+    - res4
+    - res5
+    NORM: ''
+    OUT_CHANNELS: 256
+  KEYPOINT_ON: false
+  LOAD_PROPOSALS: false
+  MASK_ON: false
+  META_ARCHITECTURE: GeneralizedRCNN
+  PANOPTIC_FPN:
+    COMBINE:
+      ENABLED: true
+      INSTANCES_CONFIDENCE_THRESH: 0.5
+      OVERLAP_THRESH: 0.5
+      STUFF_AREA_LIMIT: 4096
+    INSTANCE_LOSS_WEIGHT: 1.0
+  PIXEL_MEAN:
+  - 103.53
+  - 116.28
+  - 123.675
+  PIXEL_STD:
+  - 1.0
+  - 1.0
+  - 1.0
+  PROPOSAL_GENERATOR:
+    MIN_SIZE: 0
+    NAME: RPN
+  RESNETS:
+    DEFORM_MODULATED: false
+    DEFORM_NUM_GROUPS: 1
+    DEFORM_ON_PER_STAGE:
+    - false
+    - false
+    - false
+    - false
+    DEPTH: 50
+    NORM: FrozenBN
+    NUM_GROUPS: 1
+    OUT_FEATURES:
+    - res2
+    - res3
+    - res4
+    - res5
+    RES2_OUT_CHANNELS: 256
+    RES5_DILATION: 1
+    STEM_OUT_CHANNELS: 64
+    STRIDE_IN_1X1: true
+    WIDTH_PER_GROUP: 64
+  RETINANET:
+    BBOX_REG_WEIGHTS:
+    - 1.0
+    - 1.0
+    - 1.0
+    - 1.0
+    FOCAL_LOSS_ALPHA: 0.25
+    FOCAL_LOSS_GAMMA: 2.0
+    IN_FEATURES:
+    - p3
+    - p4
+    - p5
+    - p6
+    - p7
+    IOU_LABELS:
+    - 0
+    - -1
+    - 1
+    IOU_THRESHOLDS:
+    - 0.4
+    - 0.5
+    NMS_THRESH_TEST: 0.5
+    NUM_CLASSES: 80
+    NUM_CONVS: 4
+    PRIOR_PROB: 0.01
+    SCORE_THRESH_TEST: 0.05
+    SMOOTH_L1_LOSS_BETA: 0.1
+    TOPK_CANDIDATES_TEST: 1000
+  ROI_BOX_CASCADE_HEAD:
+    BBOX_REG_WEIGHTS:
+    - - 10.0
+      - 10.0
+      - 5.0
+      - 5.0
+    - - 20.0
+      - 20.0
+      - 10.0
+      - 10.0
+    - - 30.0
+      - 30.0
+      - 15.0
+      - 15.0
+    IOUS:
+    - 0.5
+    - 0.6
+    - 0.7
+  ROI_BOX_HEAD:
+    BBOX_REG_WEIGHTS:
+    - 10.0
+    - 10.0
+    - 5.0
+    - 5.0
+    CLS_AGNOSTIC_BBOX_REG: false
+    CONV_DIM: 256
+    FC_DIM: 1024
+    NAME: FastRCNNConvFCHead
+    NORM: ''
+    NUM_CONV: 0
+    NUM_FC: 2
+    POOLER_RESOLUTION: 7
+    POOLER_SAMPLING_RATIO: 0
+    POOLER_TYPE: ROIAlignV2
+    SMOOTH_L1_BETA: 0.0
+    TRAIN_ON_PRED_BOXES: false
+  ROI_HEADS:
+    BATCH_SIZE_PER_IMAGE: 512
+    IN_FEATURES:
+    - p2
+    - p3
+    - p4
+    - p5
+    IOU_LABELS:
+    - 0
+    - 1
+    IOU_THRESHOLDS:
+    - 0.5
+    NAME: StandardROIHeads
+    NMS_THRESH_TEST: 0.5
+    NUM_CLASSES: 80
+    POSITIVE_FRACTION: 0.25
+    PROPOSAL_APPEND_GT: true
+    SCORE_THRESH_TEST: 0.05
+  ROI_KEYPOINT_HEAD:
+    CONV_DIMS:
+    - 512
+    - 512
+    - 512
+    - 512
+    - 512
+    - 512
+    - 512
+    - 512
+    LOSS_WEIGHT: 1.0
+    MIN_KEYPOINTS_PER_IMAGE: 1
+    NAME: KRCNNConvDeconvUpsampleHead
+    NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
+    NUM_KEYPOINTS: 17
+    POOLER_RESOLUTION: 14
+    POOLER_SAMPLING_RATIO: 0
+    POOLER_TYPE: ROIAlignV2
+  ROI_MASK_HEAD:
+    CLS_AGNOSTIC_MASK: false
+    CONV_DIM: 256
+    NAME: MaskRCNNConvUpsampleHead
+    NORM: ''
+    NUM_CONV: 4
+    POOLER_RESOLUTION: 14
+    POOLER_SAMPLING_RATIO: 0
+    POOLER_TYPE: ROIAlignV2
+  RPN:
+    BATCH_SIZE_PER_IMAGE: 256
+    BBOX_REG_WEIGHTS:
+    - 1.0
+    - 1.0
+    - 1.0
+    - 1.0
+    BOUNDARY_THRESH: -1
+    HEAD_NAME: StandardRPNHead
+    IN_FEATURES:
+    - p2
+    - p3
+    - p4
+    - p5
+    - p6
+    IOU_LABELS:
+    - 0
+    - -1
+    - 1
+    IOU_THRESHOLDS:
+    - 0.3
+    - 0.7
+    LOSS_WEIGHT: 1.0
+    NMS_THRESH: 0.7
+    POSITIVE_FRACTION: 0.5
+    POST_NMS_TOPK_TEST: 1000
+    POST_NMS_TOPK_TRAIN: 1000
+    PRE_NMS_TOPK_TEST: 1000
+    PRE_NMS_TOPK_TRAIN: 2000
+    SMOOTH_L1_BETA: 0.0
+  SEM_SEG_HEAD:
+    COMMON_STRIDE: 4
+    CONVS_DIM: 128
+    IGNORE_VALUE: 255
+    IN_FEATURES:
+    - p2
+    - p3
+    - p4
+    - p5
+    LOSS_WEIGHT: 1.0
+    NAME: SemSegFPNHead
+    NORM: GN
+    NUM_CLASSES: 54
+  WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl
+OUTPUT_DIR: ./output
+SEED: -1
+SOLVER:
+  BASE_LR: 0.02
+  BIAS_LR_FACTOR: 1.0
+  CHECKPOINT_PERIOD: 20000
+  GAMMA: 0.1
+  IMS_PER_BATCH: 16
+  LR_SCHEDULER_NAME: WarmupMultiStepLR
+  MAX_ITER: 60000
+  MOMENTUM: 0.9
+  STEPS:
+  - 210000
+  - 250000
+  WARMUP_FACTOR: 0.001
+  WARMUP_ITERS: 1000
+  WARMUP_METHOD: linear
+  WEIGHT_DECAY: 0.0001
+  WEIGHT_DECAY_BIAS: 0.0001
+  WEIGHT_DECAY_NORM: 0.0
+TEST:
+  AUG:
+    ENABLED: false
+    FLIP: true
+    MAX_SIZE: 4000
+    MIN_SIZES:
+    - 400
+    - 500
+    - 600
+    - 700
+    - 800
+    - 900
+    - 1000
+    - 1100
+    - 1200
+  DETECTIONS_PER_IMAGE: 100
+  EVAL_PERIOD: 0
+  EXPECTED_RESULTS: []
+  KEYPOINT_OKS_SIGMAS: []
+  PRECISE_BN:
+    ENABLED: false
+    NUM_ITER: 200
+VERSION: 2
+VIS_PERIOD: 0

model/layout-model-training/configs/prima/mask_rcnn_R_50_FPN_3x.yaml ADDED Viewed

	@@ -0,0 +1,307 @@

+CUDNN_BENCHMARK: false
+DATALOADER:
+  ASPECT_RATIO_GROUPING: true
+  FILTER_EMPTY_ANNOTATIONS: true
+  NUM_WORKERS: 4
+  REPEAT_THRESHOLD: 0.0
+  SAMPLER_TRAIN: TrainingSampler
+DATASETS:
+  PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
+  PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
+  PROPOSAL_FILES_TEST: []
+  PROPOSAL_FILES_TRAIN: []
+  TEST: []
+  TRAIN: []
+GLOBAL:
+  HACK: 1.0
+INPUT:
+  CROP:
+    ENABLED: false
+    SIZE:
+    - 0.9
+    - 0.9
+    TYPE: relative_range
+  FORMAT: BGR
+  MASK_FORMAT: polygon
+  MAX_SIZE_TEST: 1333
+  MAX_SIZE_TRAIN: 1333
+  MIN_SIZE_TEST: 800
+  MIN_SIZE_TRAIN:
+  - 640
+  - 672
+  - 704
+  - 736
+  - 768
+  - 800
+  MIN_SIZE_TRAIN_SAMPLING: choice
+MODEL:
+  ANCHOR_GENERATOR:
+    ANGLES:
+    - - -90
+      - 0
+      - 90
+    ASPECT_RATIOS:
+    - - 0.5
+      - 1.0
+      - 2.0
+    NAME: DefaultAnchorGenerator
+    OFFSET: 0.0
+    SIZES:
+    - - 32
+    - - 64
+    - - 128
+    - - 256
+    - - 512
+  BACKBONE:
+    FREEZE_AT: 2
+    NAME: build_resnet_fpn_backbone
+  DEVICE: cuda
+  FPN:
+    FUSE_TYPE: sum
+    IN_FEATURES:
+    - res2
+    - res3
+    - res4
+    - res5
+    NORM: ''
+    OUT_CHANNELS: 256
+  KEYPOINT_ON: false
+  LOAD_PROPOSALS: false
+  MASK_ON: true
+  META_ARCHITECTURE: GeneralizedRCNN
+  PANOPTIC_FPN:
+    COMBINE:
+      ENABLED: true
+      INSTANCES_CONFIDENCE_THRESH: 0.5
+      OVERLAP_THRESH: 0.5
+      STUFF_AREA_LIMIT: 4096
+    INSTANCE_LOSS_WEIGHT: 1.0
+  PIXEL_MEAN:
+  - 103.53
+  - 116.28
+  - 123.675
+  PIXEL_STD:
+  - 1.0
+  - 1.0
+  - 1.0
+  PROPOSAL_GENERATOR:
+    MIN_SIZE: 0
+    NAME: RPN
+  RESNETS:
+    DEFORM_MODULATED: false
+    DEFORM_NUM_GROUPS: 1
+    DEFORM_ON_PER_STAGE:
+    - false
+    - false
+    - false
+    - false
+    DEPTH: 50
+    NORM: FrozenBN
+    NUM_GROUPS: 1
+    OUT_FEATURES:
+    - res2
+    - res3
+    - res4
+    - res5
+    RES2_OUT_CHANNELS: 256
+    RES5_DILATION: 1
+    STEM_OUT_CHANNELS: 64
+    STRIDE_IN_1X1: true
+    WIDTH_PER_GROUP: 64
+  RETINANET:
+    BBOX_REG_WEIGHTS:
+    - 1.0
+    - 1.0
+    - 1.0
+    - 1.0
+    FOCAL_LOSS_ALPHA: 0.25
+    FOCAL_LOSS_GAMMA: 2.0
+    IN_FEATURES:
+    - p3
+    - p4
+    - p5
+    - p6
+    - p7
+    IOU_LABELS:
+    - 0
+    - -1
+    - 1
+    IOU_THRESHOLDS:
+    - 0.4
+    - 0.5
+    NMS_THRESH_TEST: 0.5
+    NUM_CLASSES: 80
+    NUM_CONVS: 4
+    PRIOR_PROB: 0.01
+    SCORE_THRESH_TEST: 0.05
+    SMOOTH_L1_LOSS_BETA: 0.1
+    TOPK_CANDIDATES_TEST: 1000
+  ROI_BOX_CASCADE_HEAD:
+    BBOX_REG_WEIGHTS:
+    - - 10.0
+      - 10.0
+      - 5.0
+      - 5.0
+    - - 20.0
+      - 20.0
+      - 10.0
+      - 10.0
+    - - 30.0
+      - 30.0
+      - 15.0
+      - 15.0
+    IOUS:
+    - 0.5
+    - 0.6
+    - 0.7
+  ROI_BOX_HEAD:
+    BBOX_REG_WEIGHTS:
+    - 10.0
+    - 10.0
+    - 5.0
+    - 5.0
+    CLS_AGNOSTIC_BBOX_REG: false
+    CONV_DIM: 256
+    FC_DIM: 1024
+    NAME: FastRCNNConvFCHead
+    NORM: ''
+    NUM_CONV: 0
+    NUM_FC: 2
+    POOLER_RESOLUTION: 7
+    POOLER_SAMPLING_RATIO: 0
+    POOLER_TYPE: ROIAlignV2
+    SMOOTH_L1_BETA: 0.0
+    TRAIN_ON_PRED_BOXES: false
+  ROI_HEADS:
+    BATCH_SIZE_PER_IMAGE: 512
+    IN_FEATURES:
+    - p2
+    - p3
+    - p4
+    - p5
+    IOU_LABELS:
+    - 0
+    - 1
+    IOU_THRESHOLDS:
+    - 0.5
+    NAME: StandardROIHeads
+    NMS_THRESH_TEST: 0.5
+    NUM_CLASSES: 80
+    POSITIVE_FRACTION: 0.25
+    PROPOSAL_APPEND_GT: true
+    SCORE_THRESH_TEST: 0.05
+  ROI_KEYPOINT_HEAD:
+    CONV_DIMS:
+    - 512
+    - 512
+    - 512
+    - 512
+    - 512
+    - 512
+    - 512
+    - 512
+    LOSS_WEIGHT: 1.0
+    MIN_KEYPOINTS_PER_IMAGE: 1
+    NAME: KRCNNConvDeconvUpsampleHead
+    NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
+    NUM_KEYPOINTS: 17
+    POOLER_RESOLUTION: 14
+    POOLER_SAMPLING_RATIO: 0
+    POOLER_TYPE: ROIAlignV2
+  ROI_MASK_HEAD:
+    CLS_AGNOSTIC_MASK: false
+    CONV_DIM: 256
+    NAME: MaskRCNNConvUpsampleHead
+    NORM: ''
+    NUM_CONV: 4
+    POOLER_RESOLUTION: 14
+    POOLER_SAMPLING_RATIO: 0
+    POOLER_TYPE: ROIAlignV2
+  RPN:
+    BATCH_SIZE_PER_IMAGE: 256
+    BBOX_REG_WEIGHTS:
+    - 1.0
+    - 1.0
+    - 1.0
+    - 1.0
+    BOUNDARY_THRESH: -1
+    HEAD_NAME: StandardRPNHead
+    IN_FEATURES:
+    - p2
+    - p3
+    - p4
+    - p5
+    - p6
+    IOU_LABELS:
+    - 0
+    - -1
+    - 1
+    IOU_THRESHOLDS:
+    - 0.3
+    - 0.7
+    LOSS_WEIGHT: 1.0
+    NMS_THRESH: 0.7
+    POSITIVE_FRACTION: 0.5
+    POST_NMS_TOPK_TEST: 1000
+    POST_NMS_TOPK_TRAIN: 1000
+    PRE_NMS_TOPK_TEST: 1000
+    PRE_NMS_TOPK_TRAIN: 2000
+    SMOOTH_L1_BETA: 0.0
+  SEM_SEG_HEAD:
+    COMMON_STRIDE: 4
+    CONVS_DIM: 128
+    IGNORE_VALUE: 255
+    IN_FEATURES:
+    - p2
+    - p3
+    - p4
+    - p5
+    LOSS_WEIGHT: 1.0
+    NAME: SemSegFPNHead
+    NORM: GN
+    NUM_CLASSES: 54
+  WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl
+OUTPUT_DIR: ./output
+SEED: -1
+SOLVER:
+  BASE_LR: 0.02
+  BIAS_LR_FACTOR: 1.0
+  CHECKPOINT_PERIOD: 20000
+  GAMMA: 0.1
+  IMS_PER_BATCH: 16
+  LR_SCHEDULER_NAME: WarmupMultiStepLR
+  MAX_ITER: 60000
+  MOMENTUM: 0.9
+  STEPS:
+  - 210000
+  - 250000
+  WARMUP_FACTOR: 0.001
+  WARMUP_ITERS: 1000
+  WARMUP_METHOD: linear
+  WEIGHT_DECAY: 0.0001
+  WEIGHT_DECAY_BIAS: 0.0001
+  WEIGHT_DECAY_NORM: 0.0
+TEST:
+  AUG:
+    ENABLED: false
+    FLIP: true
+    MAX_SIZE: 4000
+    MIN_SIZES:
+    - 400
+    - 500
+    - 600
+    - 700
+    - 800
+    - 900
+    - 1000
+    - 1100
+    - 1200
+  DETECTIONS_PER_IMAGE: 100
+  EVAL_PERIOD: 0
+  EXPECTED_RESULTS: []
+  KEYPOINT_OKS_SIGMAS: []
+  PRECISE_BN:
+    ENABLED: false
+    NUM_ITER: 200
+VERSION: 2
+VIS_PERIOD: 0

model/layout-model-training/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+layoutparser
+funcy
+bs4
+scikit-learn
+imagesize
+tqdm

model/layout-model-training/scripts/train_prima.sh ADDED Viewed

	@@ -0,0 +1,17 @@

+#!/bin/bash
+cd ../tools
+python convert_prima_to_coco.py \
+    --prima_datapath ../data/prima \
+    --anno_savepath ../data/prima/annotations.json
+python train_net.py \
+    --dataset_name          prima-layout \
+    --json_annotation_train ../data/prima/annotations-train.json \
+    --image_path_train      ../data/prima/Images \
+    --json_annotation_val   ../data/prima/annotations-val.json \
+    --image_path_val        ../data/prima/Images \
+    --config-file           ../configs/prima/mask_rcnn_R_50_FPN_3x.yaml \
+    OUTPUT_DIR  ../outputs/prima/mask_rcnn_R_50_FPN_3x/ \
+    SOLVER.IMS_PER_BATCH 2

model/layout-model-training/tools/convert_prima_to_coco.py ADDED Viewed

	@@ -0,0 +1,225 @@

+import os, re, json
+import imagesize
+from glob import glob
+from bs4 import BeautifulSoup
+import numpy as np
+from PIL import Image
+import argparse
+from tqdm import tqdm
+import sys
+sys.path.append('..')
+from utils import cocosplit
+class NpEncoder(json.JSONEncoder):
+    def default(self, obj):
+        if isinstance(obj, np.integer):
+            return int(obj)
+        elif isinstance(obj, np.floating):
+            return float(obj)
+        elif isinstance(obj, np.ndarray):
+            return obj.tolist()
+        else:
+            return super(NpEncoder, self).default(obj)
+def cvt_coords_to_array(obj):
+    return np.array(
+            [(float(pt['x']), float(pt['y']))
+                 for pt in obj.find_all("Point")]
+        )
+def cal_ployarea(points):
+    x = points[:,0]
+    y = points[:,1]
+    return 0.5*np.abs(np.dot(x,np.roll(y,1))-np.dot(y,np.roll(x,1)))
+def _create_category(schema=0):
+    if schema==0:
+        categories = \
+            [{"supercategory": "layout", "id": 0, "name": "Background"},
+             {"supercategory": "layout", "id": 1, "name": "TextRegion"},
+             {"supercategory": "layout", "id": 2, "name": "ImageRegion"},
+             {"supercategory": "layout", "id": 3, "name": "TableRegion"},
+             {"supercategory": "layout", "id": 4, "name": "MathsRegion"},
+             {"supercategory": "layout", "id": 5, "name": "SeparatorRegion"},
+             {"supercategory": "layout", "id": 6, "name": "OtherRegion"}]
+        find_categories = lambda name: \
+            [val["id"] for val in categories if val['name'] == name][0]
+        conversion = \
+            {
+                'TextRegion':       find_categories("TextRegion"),
+                'TableRegion':      find_categories("TableRegion"),
+                'MathsRegion':      find_categories("MathsRegion"),
+                'ChartRegion':      find_categories("ImageRegion"),
+                'GraphicRegion':    find_categories("ImageRegion"),
+                'ImageRegion':      find_categories("ImageRegion"),
+                'LineDrawingRegion':find_categories("OtherRegion"),
+                'SeparatorRegion':  find_categories("SeparatorRegion"),
+                'NoiseRegion':      find_categories("OtherRegion"),
+                'FrameRegion':      find_categories("OtherRegion"),
+            }
+        return categories, conversion
+_categories, _categories_conversion = _create_category(schema=0)
+_info = {
+    "description": "PRIMA Layout Analysis Dataset",
+    "url": "https://www.primaresearch.org/datasets/Layout_Analysis",
+    "version": "1.0",
+    "year": 2010,
+    "contributor": "PRIMA Research",
+    "date_created": "2020/09/01",
+}
+def _load_soup(filename):
+    with open(filename, "r") as fp:
+        soup = BeautifulSoup(fp.read(),'xml')
+    return soup
+def _image_template(image_id, image_path):
+    width, height = imagesize.get(image_path)
+    return {
+        "file_name": os.path.basename(image_path),
+        "height": height,
+        "width": width,
+        "id": int(image_id)
+    }
+def _anno_template(anno_id, image_id, pts, obj_tag):
+    x_1, x_2 = pts[:,0].min(), pts[:,0].max()
+    y_1, y_2 = pts[:,1].min(), pts[:,1].max()
+    height = y_2 - y_1
+    width  = x_2 - x_1
+    return {
+        "segmentation": [pts.flatten().tolist()],
+        "area": cal_ployarea(pts),
+        "iscrowd": 0,
+        "image_id": image_id,
+        "bbox": [x_1, y_1, width, height],
+        "category_id": _categories_conversion[obj_tag],
+        "id": anno_id
+    }
+class PRIMADataset():
+    def __init__(self, base_path, anno_path='XML',
+                                  image_path='Images'):
+        self.base_path = base_path
+        self.anno_path = os.path.join(base_path, anno_path)
+        self.image_path = os.path.join(base_path, image_path)
+        self._ids = self.find_all_image_ids()
+    def __len__(self):
+        return len(self.ids)
+    def __getitem__(self, idx):
+        return self.load_image_and_annotaiton(idx)
+    def find_all_annotation_files(self):
+        return glob(os.path.join(self.anno_path, '*.xml'))
+    def find_all_image_ids(self):
+        replacer = lambda s: os.path.basename(s).replace('pc-', '').replace('.xml', '')
+        return [replacer(s) for s in self.find_all_annotation_files()]
+    def load_image_and_annotaiton(self, idx):
+        image_id = self._ids[idx]
+        image_path = os.path.join(self.image_path, f'{image_id}.tif')
+        image = Image.open(image_path)
+        anno = self.load_annotation(idx)
+        return image, anno
+    def load_annotation(self, idx):
+        image_id = self._ids[idx]
+        anno_path  = os.path.join(self.anno_path,  f'pc-{image_id}.xml')
+        # A dirtly hack to load the files w/wo pc- simualtaneously
+        if not os.path.exists(anno_path):
+            anno_path = os.path.join(self.anno_path,  f'{image_id}.xml')
+            assert os.path.exists(anno_path), "Invalid path"
+        anno = _load_soup(anno_path)
+        return anno
+    def convert_to_COCO(self, save_path):
+        all_image_infos = []
+        all_anno_infos  = []
+        anno_id = 0
+        for idx, image_id in enumerate(tqdm(self._ids)):
+            # We use the idx as the image id
+            image_path = os.path.join(self.image_path, f'{image_id}.tif')
+            image_info = _image_template(idx, image_path)
+            all_image_infos.append(image_info)
+            anno = self.load_annotation(idx)
+            for item in anno.find_all(re.compile(".*Region")):
+                pts = cvt_coords_to_array(item.Coords)
+                if 0 not in pts.shape:
+                    # Sometimes there will be polygons with less
+                    # than 4 edges, and they could not be appropriately
+                    # handled by the COCO format. So we just drop them.
+                    if pts.shape[0] >= 4:
+                        anno_info = _anno_template(anno_id, idx, pts, item.name)
+                        all_anno_infos.append(anno_info)
+                        anno_id += 1
+        final_annotation = {
+            "info": _info,
+            "licenses": [],
+            "images": all_image_infos,
+            "annotations": all_anno_infos,
+            "categories": _categories}
+        with open(save_path, 'w') as fp:
+            json.dump(final_annotation, fp, cls=NpEncoder)
+        return final_annotation
+parser = argparse.ArgumentParser()
+parser.add_argument('--prima_datapath', type=str, default='./data/prima', help='the path to the prima data folders')
+parser.add_argument('--anno_savepath',  type=str, default='./annotations.json', help='the path to save the new annotations')
+if __name__ == "__main__":
+    args = parser.parse_args()
+    print("Start running the conversion script")
+    print(f"Loading the information from the path {args.prima_datapath}")
+    dataset = PRIMADataset(args.prima_datapath)
+    print(f"Saving the annotation to {args.anno_savepath}")
+    res = dataset.convert_to_COCO(args.anno_savepath)
+    cocosplit.main(
+        args.anno_savepath,
+        split_ratio=0.8,
+        having_annotations=True,
+        train_save_path=args.anno_savepath.replace('.json', '-train.json'),
+        test_save_path=args.anno_savepath.replace('.json', '-val.json'),
+        random_state=24)

model/layout-model-training/tools/train_net.py ADDED Viewed

	@@ -0,0 +1,229 @@

+"""
+The script is based on https://github.com/facebookresearch/detectron2/blob/master/tools/train_net.py.
+"""
+import logging
+import os
+import json
+from collections import OrderedDict
+import detectron2.utils.comm as comm
+import detectron2.data.transforms as T
+from detectron2.checkpoint import DetectionCheckpointer
+from detectron2.config import get_cfg
+from detectron2.data import DatasetMapper, build_detection_train_loader
+from detectron2.data.datasets import register_coco_instances
+from detectron2.engine import (
+    DefaultTrainer,
+    default_argument_parser,
+    default_setup,
+    hooks,
+    launch,
+)
+from detectron2.evaluation import (
+    COCOEvaluator,
+    verify_results,
+)
+from detectron2.modeling import GeneralizedRCNNWithTTA
+import pandas as pd
+def get_augs(cfg):
+    """Add all the desired augmentations here. A list of availble augmentations
+    can be found here:
+       https://detectron2.readthedocs.io/en/latest/modules/data_transforms.html
+    """
+    augs = [
+        T.ResizeShortestEdge(
+            cfg.INPUT.MIN_SIZE_TRAIN,
+            cfg.INPUT.MAX_SIZE_TRAIN,
+            cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING,
+        )
+    ]
+    if cfg.INPUT.CROP.ENABLED:
+        augs.append(
+            T.RandomCrop_CategoryAreaConstraint(
+                cfg.INPUT.CROP.TYPE,
+                cfg.INPUT.CROP.SIZE,
+                cfg.INPUT.CROP.SINGLE_CATEGORY_MAX_AREA,
+                cfg.MODEL.SEM_SEG_HEAD.IGNORE_VALUE,
+            )
+        )
+    horizontal_flip: bool = cfg.INPUT.RANDOM_FLIP == "horizontal"
+    augs.append(T.RandomFlip(horizontal=horizontal_flip, vertical=not horizontal_flip))
+    # Rotate the image between -90 to 0 degrees clockwise around the centre
+    augs.append(T.RandomRotation(angle=[-90.0, 0.0]))
+    return augs
+class Trainer(DefaultTrainer):
+    """
+    We use the "DefaultTrainer" which contains pre-defined default logic for
+    standard training workflow. They may not work for you, especially if you
+    are working on a new research project. In that case you can use the cleaner
+    "SimpleTrainer", or write your own training loop. You can use
+    "tools/plain_train_net.py" as an example.
+    Adapted from:
+        https://github.com/facebookresearch/detectron2/blob/master/projects/DeepLab/train_net.py
+    """
+    @classmethod
+    def build_train_loader(cls, cfg):
+        mapper = DatasetMapper(cfg, is_train=True, augmentations=get_augs(cfg))
+        return build_detection_train_loader(cfg, mapper=mapper)
+    @classmethod
+    def build_evaluator(cls, cfg, dataset_name, output_folder=None):
+        """
+        Returns:
+            DatasetEvaluator or None
+        It is not implemented by default.
+        """
+        return COCOEvaluator(dataset_name, cfg, True, output_folder)
+    @classmethod
+    def test_with_TTA(cls, cfg, model):
+        logger = logging.getLogger("detectron2.trainer")
+        # In the end of training, run an evaluation with TTA
+        # Only support some R-CNN models.
+        logger.info("Running inference with test-time augmentation ...")
+        model = GeneralizedRCNNWithTTA(cfg, model)
+        evaluators = [
+            cls.build_evaluator(
+                cfg, name, output_folder=os.path.join(cfg.OUTPUT_DIR, "inference_TTA")
+            )
+            for name in cfg.DATASETS.TEST
+        ]
+        res = cls.test(cfg, model, evaluators)
+        res = OrderedDict({k + "_TTA": v for k, v in res.items()})
+        return res
+    @classmethod
+    def eval_and_save(cls, cfg, model):
+        evaluators = [
+            cls.build_evaluator(
+                cfg, name, output_folder=os.path.join(cfg.OUTPUT_DIR, "inference")
+            )
+            for name in cfg.DATASETS.TEST
+        ]
+        res = cls.test(cfg, model, evaluators)
+        pd.DataFrame(res).to_csv(os.path.join(cfg.OUTPUT_DIR, "eval.csv"))
+        return res
+def setup(args):
+    """
+    Create configs and perform basic setups.
+    """
+    cfg = get_cfg()
+    if args.config_file != "":
+        cfg.merge_from_file(args.config_file)
+    cfg.merge_from_list(args.opts)
+    with open(args.json_annotation_train, "r") as fp:
+        anno_file = json.load(fp)
+    cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(anno_file["categories"])
+    del anno_file
+    cfg.DATASETS.TRAIN = (f"{args.dataset_name}-train",)
+    cfg.DATASETS.TEST = (f"{args.dataset_name}-val",)
+    cfg.freeze()
+    default_setup(cfg, args)
+    return cfg
+def main(args):
+    # Register Datasets
+    register_coco_instances(
+        f"{args.dataset_name}-train",
+        {},
+        args.json_annotation_train,
+        args.image_path_train,
+    )
+    register_coco_instances(
+        f"{args.dataset_name}-val",
+        {},
+        args.json_annotation_val,
+        args.image_path_val
+    )
+    cfg = setup(args)
+    if args.eval_only:
+        model = Trainer.build_model(cfg)
+        DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
+            cfg.MODEL.WEIGHTS, resume=args.resume
+        )
+        res = Trainer.test(cfg, model)
+        if cfg.TEST.AUG.ENABLED:
+            res.update(Trainer.test_with_TTA(cfg, model))
+        if comm.is_main_process():
+            verify_results(cfg, res)
+        # Save the evaluation results
+        pd.DataFrame(res).to_csv(f"{cfg.OUTPUT_DIR}/eval.csv")
+        return res
+    # Ensure that the Output directory exists
+    os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
+    """
+    If you'd like to do anything fancier than the standard training logic,
+    consider writing your own training loop (see plain_train_net.py) or
+    subclassing the trainer.
+    """
+    trainer = Trainer(cfg)
+    trainer.resume_or_load(resume=args.resume)
+    trainer.register_hooks(
+        [hooks.EvalHook(0, lambda: trainer.eval_and_save(cfg, trainer.model))]
+    )
+    if cfg.TEST.AUG.ENABLED:
+        trainer.register_hooks(
+            [hooks.EvalHook(0, lambda: trainer.test_with_TTA(cfg, trainer.model))]
+        )
+    return trainer.train()
+if __name__ == "__main__":
+    parser = default_argument_parser()
+    # Extra Configurations for dataset names and paths
+    parser.add_argument(
+        "--dataset_name",
+        help="The Dataset Name")
+    parser.add_argument(
+        "--json_annotation_train",
+        help="The path to the training set JSON annotation",
+    )
+    parser.add_argument(
+        "--image_path_train",
+        help="The path to the training set image folder",
+    )
+    parser.add_argument(
+        "--json_annotation_val",
+        help="The path to the validation set JSON annotation",
+    )
+    parser.add_argument(
+        "--image_path_val",
+        help="The path to the validation set image folder",
+    )
+    args = parser.parse_args()
+    print("Command Line Args:", args)
+    # Dataset Registration is moved to the main function to support multi-gpu training
+    # See ref https://github.com/facebookresearch/detectron2/issues/253#issuecomment-554216517
+    launch(
+        main,
+        args.num_gpus,
+        num_machines=args.num_machines,
+        machine_rank=args.machine_rank,
+        dist_url=args.dist_url,
+        args=(args,),
+    )

model/layout-model-training/utils/__init__.py ADDED Viewed

File without changes

model/layout-model-training/utils/cocosplit.py ADDED Viewed

	@@ -0,0 +1,112 @@

+# Modified based on https://github.com/akarazniewicz/cocosplit/blob/master/cocosplit.py
+import json
+import argparse
+import funcy
+from sklearn.model_selection import train_test_split
+parser = argparse.ArgumentParser(
+    description="Splits COCO annotations file into training and test sets."
+)
+parser.add_argument(
+    "--annotation-path",
+    metavar="coco_annotations",
+    type=str,
+    help="Path to COCO annotations file.",
+)
+parser.add_argument(
+    "--train", type=str, help="Where to store COCO training annotations"
+)
+parser.add_argument("--test", type=str, help="Where to store COCO test annotations")
+parser.add_argument(
+    "--split-ratio",
+    dest="split_ratio",
+    type=float,
+    required=True,
+    help="A percentage of a split; a number in (0, 1)",
+)
+parser.add_argument(
+    "--having-annotations",
+    dest="having_annotations",
+    action="store_true",
+    help="Ignore all images without annotations. Keep only these with at least one annotation",
+)
+def save_coco(file, tagged_data):
+    with open(file, "wt", encoding="UTF-8") as coco:
+        json.dump(tagged_data, coco, indent=2, sort_keys=True)
+def filter_annotations(annotations, images):
+    image_ids = funcy.lmap(lambda i: int(i["id"]), images)
+    return funcy.lfilter(lambda a: int(a["image_id"]) in image_ids, annotations)
+def main(
+    annotation_path,
+    split_ratio,
+    having_annotations,
+    train_save_path,
+    test_save_path,
+    random_state=None,
+):
+    with open(annotation_path, "rt", encoding="UTF-8") as annotations:
+        coco = json.load(annotations)
+    images = coco["images"]
+    annotations = coco["annotations"]
+    ids_with_annotations = funcy.lmap(lambda a: int(a["image_id"]), annotations)
+    # Images with annotations
+    img_ann = funcy.lremove(lambda i: i["id"] not in ids_with_annotations, images)
+    tr_ann, ts_ann = train_test_split(
+        img_ann, train_size=split_ratio, random_state=random_state
+    )
+    img_wo_ann = funcy.lremove(lambda i: i["id"] in ids_with_annotations, images)
+    if len(img_wo_ann) > 0:
+        tr_wo_ann, ts_wo_ann = train_test_split(
+            img_wo_ann, train_size=split_ratio, random_state=random_state
+        )
+    else:
+        tr_wo_ann, ts_wo_ann = [], []  # Images without annotations
+    if having_annotations:
+        tr, ts = tr_ann, ts_ann
+    else:
+        # Merging the 2 image lists (i.e. with and without annotation)
+        tr_ann.extend(tr_wo_ann)
+        ts_ann.extend(ts_wo_ann)
+        tr, ts = tr_ann, ts_ann
+    # Train Data
+    coco.update({"images": tr, "annotations": filter_annotations(annotations, tr)})
+    save_coco(train_save_path, coco)
+    # Test Data
+    coco.update({"images": ts, "annotations": filter_annotations(annotations, ts)})
+    save_coco(test_save_path, coco)
+    print(
+        "Saved {} entries in {} and {} in {}".format(
+            len(tr), train_save_path, len(ts), test_save_path
+        )
+    )
+if __name__ == "__main__":
+    args = parser.parse_args()
+    main(
+        args.annotation_path,
+        args.split_ratio,
+        args.having_annotations,
+        args.train,
+        args.test,
+        random_state=24,
+    )