Image Difference Segmentation

For the main repository and code, please refer to the GitHub Repo.

This project enables creation of large binary segmentation datasets through use of image differences. Certain domains, such as comic books or manga, take particularly well to the proposed approach. Creating a dataset and training a segmentation model involves two manual steps (outside of the code in this repository):

Finding and sorting suitable data. Ideally, your data should have two or more classes wherein the only difference between the classes should be the subject that is to be segmented. An example would be an English page from a comic and a French page from the same comic.
Segmentation masks must be manually created for a small number of image differences. Using a pretrained DiffNet requires only 20-50 new masks. Re-training DiffNet from scratch requires 100-200 masks. For quickly generating binary segmentation masks, simple-masker was written/used.

Prerequisites

The following must be on your system:

Python 3.6+
An accompanying Pip installation
Python and Pip must be accessible from the command line
An NVIDIA GPU that is CUDA-capable (6GB+ of VRAM likely needed)

Using a Pretrained Model

Downloading the Weights File

Weights for this project are hosted at HuggingFace under weights directory. Currently, a DiffNet instance trained on text differences is provided. To use this model, download it and move it to the weights directory in your local copy of this repository.

Using Pretrained Weights

Pretrained weights can be used in the batch_process.py file and the evaluate.py file. For both files, specify the path to your weights file using the --weights_path CLI argument.

License

MIT