Spaces:

mckabue
/

document-similarity-matching-using-visual-layout-features-archive

Build error

loliipopshock commited on Jul 18, 2020

Commit

4d61931

1 Parent(s): 9aafcc3

Add documentation for prima training

Files changed (1) hide show

README.md CHANGED Viewed

@@ -6,6 +6,21 @@
 - In `scripts/`, it lists specific command for running the code for processing the given dataset.
 - The `configs/` contains the configuration for different deep learning models, and is organized by datasets.
 ## Reference
 - **[cocosplit](https://github.com/akarazniewicz/cocosplit)**  A script that splits the coco annotations into train and test sets.

 - In `scripts/`, it lists specific command for running the code for processing the given dataset.
 - The `configs/` contains the configuration for different deep learning models, and is organized by datasets.
+## Supported Datasets
+- Prima Layout Analysis Dataset [`scripts/train_prima.sh`](https://github.com/Layout-Parser/layout-model-training/blob/master/scripts/train_prima.sh)
+    - You will need to download the dataset from the [official website](https://www.primaresearch.org/dataset/) and put it in the `data/prima` folder.
+    - As the original dataset is stored in the [PAGE format](https://www.primaresearch.org/tools/PAGEViewer), the script will use [`tools/convert_prima_to_coco.py`](https://github.com/Layout-Parser/layout-model-training/blob/master/tools/convert_prima_to_coco.py) to convert it to COCO format.
+    - The final dataset folder structure should look like:
+        ```bash
+        data/
+        └── prima/
+            ├── Images/
+            ├── XML/
+            ├── License.txt
+            └── annotations*.json
+        ```
 ## Reference
 - **[cocosplit](https://github.com/akarazniewicz/cocosplit)**  A script that splits the coco annotations into train and test sets.