|
# Beyond Document Page Classification |
|
|
|
We release the benchmarking code together with the proposed datasets: |
|
|
|
* https://huggingface.co/datasets/bdpc/rvl_cdip_mp |
|
* https://huggingface.co/datasets/bdpc/rvl_cdip_n_mp |
|
|
|
For consistency, we add it as an anonymous model repository (can be cloned) in HuggingFace. |
|
|
|
## Installation |
|
|
|
The scripts require [python >= 3.8](https://www.python.org/downloads/release/python-380/) to run. |
|
We will create a fresh virtualenvironment in which to install all required packages. |
|
```sh |
|
mkvirtualenv -p /usr/bin/python3 BYD |
|
``` |
|
|
|
Using poetry and the readily defined pyproject.toml, we will install all required packages |
|
```sh |
|
workon BYD |
|
pip3 install poetry |
|
poetry install |
|
``` |
|
|
|
## Experiments |
|
|
|
To replicate all experiment results from the paper, run experiments.sh |
|
|
|
```sh |
|
./experiments.sh |
|
``` |
|
|