Spaces:

hasibzunair
/

peekaboo-demo

Running

App Files Files Community

hasibzunair commited on Jul 29, 2024

Commit

1803579

1 Parent(s): 20deb15

add files

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.DS_Store +0 -0
README 2.md +169 -0
__init__.py +4 -0
app.py +119 -0
bilateral_solver.py +218 -0
bkg_seg.py +86 -0
configs/peekaboo_DUTS-TR.yaml +32 -0
data/.DS_Store +0 -0
data/coco_20k_filenames.txt +0 -0
data/examples/.DS_Store +0 -0
data/examples/VOC_000030.jpg +0 -0
data/examples/a.jpeg +0 -0
data/examples/b.jpeg +0 -0
data/examples/c.jpeg +0 -0
data/examples/d.jpeg +0 -0
data/examples/e.jpeg +0 -0
data/weights/peekaboo_decoder_weights_niter250.pt +3 -0
data/weights/peekaboo_decoder_weights_niter500.pt +3 -0
datasets/VOC.py +82 -0
datasets/__init__.py +0 -0
datasets/augmentations.py +70 -0
datasets/datasets.py +476 -0
datasets/geometric_transforms.py +160 -0
datasets/uod_datasets.py +396 -0
datasets/utils.py +45 -0
demo.py +118 -0
dino +1 -0
environment.yml +575 -0
environment_initial.yml +139 -0
evaluate.py +110 -0
evaluate_saliency.sh +12 -0
evaluate_uod.sh +11 -0
evaluation/__init__.py +0 -0
evaluation/metrics/__init__.py +0 -0
evaluation/metrics/average_meter.py +22 -0
evaluation/metrics/f_measure.py +112 -0
evaluation/metrics/iou.py +37 -0
evaluation/metrics/mae.py +15 -0
evaluation/metrics/pixel_acc.py +21 -0
evaluation/metrics/s_measure.py +126 -0
evaluation/saliency.py +272 -0
evaluation/uod.py +117 -0
format_codebase.sh +16 -0
media/description.html +22 -0
misc.py +337 -0
model.py +180 -0
notebooks/exp.ipynb +434 -0
notebooks/graphs.ipynb +249 -0
notebooks/visualize.ipynb +571 -0
outputs/VOC_000030-peekaboo.png +0 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

README 2.md ADDED Viewed

	@@ -0,0 +1,169 @@

+# Peekaboo
+**Concordia University**
+Hasib Zunair, A. Ben Hamza
+[[`Paper`](https://arxiv.org/abs/2407.17628)] [[`Project`](https://hasibzunair.github.io/peekaboo/)] [[`Demo`](#4-demo)] [[`BibTeX`](#5-citation)]
+This is official code for our **BMVC 2024 paper**:<br>
+[PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization](Link)
+<br>
+![MSL Design](./media/figure.jpg)
+We aim to explicitly model contextual relationship among pixels through image masking for unsupervised object localization. In a self-supervised procedure (i.e. pretext task) without any additional training (i.e. downstream task), context-based representation learning is done at both the pixel-level by making predictions on masked images and at shape-level by matching the predictions of the masked input to the unmasked one.
+## 1. Specification of dependencies
+This code requires Python 3.8 and CUDA 11.2. Clone the project repository, then create and activate the following conda envrionment.
+```bash
+# clone repo
+git clone https://github.com/hasibzunair/peekaboo
+cd peekaboo
+# create env
+conda update conda
+conda env create -f environment.yml
+conda activate peekaboo
+```
+Or, you can also create a fresh environment and install the project requirements inside that environment by:
+```bash
+# clone repo
+git clone https://github.com/hasibzunair/peekaboo
+cd peekaboo
+# create fresh env
+conda create -n peekaboo python=3.8
+conda activate peekaboo
+# example of pytorch installation
+pip install torch===1.8.1 torchvision==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
+pip install pycocotools
+# install dependencies
+pip install -r requirements.txt
+```
+And then, install [DINO](https://arxiv.org/pdf/2104.14294.pdf) using the following commands:
+```bash
+git clone https://github.com/facebookresearch/dino.git
+cd dino;
+touch __init__.py
+echo -e "import sys\nfrom os.path import dirname, join\nsys.path.insert(0, join(dirname(__file__), '.'))" >> __init__.py; cd ../;
+```
+## 2a. Training code
+### Dataset details
+We train Peekaboo on only the images of [DUTS-TR](http://saliencydetection.net/duts/) dataset without any labels, since Peekaboo is self-supervised. Download it, then create a directory inside the project folder named `datasets_local` and put it there.
+We evaluate on two tasks: unsupervised saliency detection and single object discovery. Since our method is used in an unsupervised setting, it does not require training or fine-tuning on the datasets we evaluate on.
+#### Unsupervised Saliency Detection
+We use the following datasets:
+- [DUT-OMRON](http://saliencydetection.net/dut-omron/)
+- [DUTS-TEST](http://saliencydetection.net/duts/)
+- [ECSSD](https://www.cse.cuhk.edu.hk/leojia/projects/hsaliency/dataset.html)
+Download the datasets and keep them in `datasets_local`.
+#### Single Object Discovery
+For single object discovery, we follow the framework used in [LOST](https://github.com/valeoai/LOST). Download the datasets and put them in the folder `datasets_local`.
+- [VOC07](http://host.robots.ox.ac.uk/pascal/VOC/)
+- [VOC12](http://host.robots.ox.ac.uk/pascal/VOC/)
+- [COCO20k](https://cocodataset.org/#home)
+Finally, download the masks of random streaks and holes of arbitrary shapes from [SCRIBBLES.zip](https://github.com/hasibzunair/masksup-segmentation/releases/download/v1.0/SCRIBBLES.zip) and put it inside `datasets` folder.
+### DUTS-TR training
+```bash
+export DATASET_DIR=datasets_local # root directory training and evaluation datasets
+python train.py --exp-name peekaboo --dataset-dir $DATASET_DIR
+```
+See tensorboard logs by running: `tensorboard --logdir=outputs`.
+## 2b. Evaluation code
+After training, the model checkpoint and logs are available in `peekaboo-DUTS-TR-vit_small8` in the `outputs` folder. Set the model path for evaluation.
+```bash
+export MODEL="outputs/peekaboo-DUTS-TR-vit_small8/decoder_weights_niter500.pt"
+```
+### Unsupervised saliency detection eval
+```bash
+# run evaluation
+source evaluate_saliency.sh $MODEL $DATASET_DIR single
+source evaluate_saliency.sh $MODEL $DATASET_DIR multi
+```
+### Single object discovery eval
+```bash
+# run evalulation
+source evaluate_uod.sh $MODEL $DATASET_DIR
+```
+All experiments are conducted on a single NVIDIA 3080Ti GPU. For additional implementation details and results, please refer to the supplementary materials section in the paper.
+## 3. Pre-trained models
+We provide pretrained models on [./data/weights/](./data/weights/) for reproducibility. Here are the main results of Peekaboo on single object discovery task. For results on unsupervised saliency detection task, we refer readers to our paper!
+|Dataset      | Backbone  |   CorLoc (%)  |   Download   |
+|  ---------- | -------   |  ------ |  --------   |
+| VOC07 | ViT-S/8  | 72.7 | [download](./data/weights/peekaboo_decoder_weights_niter500.pt) |
+| VOC12 | ViT-S/8 | 75.9 | [download](./data/weights/peekaboo_decoder_weights_niter500.pt) |
+| COCO20K | ViT-S/8 | 64.0 | [download](./data/weights/peekaboo_decoder_weights_niter500.pt) |
+## 4. Demo
+We provide prediction demos of our models. The following applies and visualizes our method on a single image.
+```bash
+# infer on one image
+python demo.py
+```
+## 5. Citation
+```bibtex
+ @inproceedings{zunair2024peekaboo,
+    title={PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization},
+    author={Zunair, Hasib and Hamza, A Ben},
+    booktitle={Proc. British Machine Vision Conference},
+    year={2024}
+  }
+```
+## Project Notes
+<details><summary>Click to view</summary>
+<br>
+**[Mar 18, 2024]** Infer on image folders.
+```python
+# infer on folder of images
+python visualize_outputs.py --model-weights outputs/msl_a1.5_b1_g1_reg4-MSL-DUTS-TR-vit_small8/decoder_weights_niter500.pt --img-folder ./datasets_local/DUTS-TR/DUTS-TR-Image/ --output-dir outputs/visualizations/msl_masks
+```
+**[Nov 10, 2023]** Reproduced FOUND results.
+**[Nov 10, 2023]** Added project notes section.
+</details>
+## Acknowledgements
+This repository was built on top of [FOUND](https://github.com/valeoai/FOUND), [SelfMask](https://github.com/NoelShin/selfmask), [TokenCut](https://github.com/YangtaoWANG95/TokenCut) and [LOST](https://github.com/valeoai/LOST). Consider acknowledging these projects.

__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+import sys
+from os.path import dirname, join
+sys.path.insert(0, join(dirname(__file__), "."))

app.py ADDED Viewed

	@@ -0,0 +1,119 @@

+import os
+import torch
+import argparse
+import torch.nn as nn
+import torch.nn.functional as F
+import matplotlib.pyplot as plt
+import gradio as gr
+import codecs
+import numpy as np
+import cv2
+from PIL import Image
+from model import PeekabooModel
+from misc import load_config
+from torchvision import transforms as T
+NORMALIZE = T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
+if __name__ == "__main__":
+    def inference(img_path):
+        # Load the image
+        with open(img_path, "rb") as f:
+            img = Image.open(f)
+            img = img.convert("RGB")
+            img_np = np.array(img)
+            # Preprocess
+            t = T.Compose([T.ToTensor(), NORMALIZE])
+            img_t = t(img)[None, :, :, :]
+            inputs = img_t.to(device)
+        # Forward step
+        print(f"Start Peekaboo prediction.")
+        with torch.no_grad():
+            preds = model(inputs, for_eval=True)
+        print(f"Done Peekaboo prediction.")
+        sigmoid = nn.Sigmoid()
+        h, w = img_t.shape[-2:]
+        preds_up = F.interpolate(
+            preds, scale_factor=model.vit_patch_size, mode="bilinear", align_corners=False
+        )[..., :h, :w]
+        preds_up = (sigmoid(preds_up.detach()) > 0.5).squeeze(0).float()
+        preds_up = preds_up.cpu().squeeze().numpy()
+        # Overlay predicted mask with input image
+        preds_up_np = (preds_up / np.max(preds_up) * 255).astype(np.uint8)
+        preds_up_np_3d = np.stack([preds_up_np, preds_up_np, preds_up_np], axis=-1)
+        combined_image = cv2.addWeighted(img_np, 0.5, preds_up_np_3d, 0.5, 0)
+        print(f"Output shape is {combined_image.shape}")
+        return combined_image
+    parser = argparse.ArgumentParser(
+        description="Evaluation of Peekaboo",
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+    )
+    parser.add_argument(
+        "--img-path",
+        type=str,
+        default="data/examples/VOC_000030.jpg",
+        help="Image path.",
+    )
+    parser.add_argument(
+        "--model-weights",
+        type=str,
+        default="data/weights/peekaboo_decoder_weights_niter500.pt",
+    )
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="configs/peekaboo_DUTS-TR.yaml",
+    )
+    parser.add_argument(
+        "--output-dir",
+        type=str,
+        default="outputs",
+    )
+    args = parser.parse_args()
+    # Configuration
+    config, _ = load_config(args.config)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    # Load the model
+    model = PeekabooModel(
+        vit_model=config.model["pre_training"],
+        vit_arch=config.model["arch"],
+        vit_patch_size=config.model["patch_size"],
+        enc_type_feats=config.peekaboo["feats"],
+    )
+    # Load weights
+    model.decoder_load_weights(args.model_weights)
+    model.eval()
+    print(f"Model {args.model_weights} loaded correctly.")
+    # App
+    title = "PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization"
+    description = codecs.open("./media/description.html", "r", "utf-8").read()
+    article = "<p style='text-align: center'><a href='https://arxiv.org/abs/2407.17628' target='_blank'>PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization</a> | <a href='https://github.com/hasibzunair/peekaboo' target='_blank'>Github</a></p>"
+    gr.Interface(
+        inference,
+        gr.inputs.Image(type="filepath", label="Input Image"),
+        gr.outputs.Image(type="numpy", label="Predicted Output"),
+        examples=[
+            "./data/examples/a.jpeg",
+            "./data/examples/b.jpeg",
+            "./data/examples/c.jpeg",
+            "./data/examples/d.jpeg",
+            "./data/examples/e.jpeg"
+        ],
+        title=title,
+        description=description,
+        article=article,
+        allow_flagging=False,
+        analytics_enabled=False,
+    ).launch(debug=True, enable_queue=True)

bilateral_solver.py ADDED Viewed

	@@ -0,0 +1,218 @@

+"""
+Code adapted from TokenCut: https://github.com/YangtaoWANG95/TokenCut
+"""
+import PIL.Image as Image
+import numpy as np
+from scipy import ndimage
+from scipy.sparse import diags, csr_matrix
+from scipy.sparse.linalg import cg
+RGB_TO_YUV = np.array(
+    [[0.299, 0.587, 0.114], [-0.168736, -0.331264, 0.5], [0.5, -0.418688, -0.081312]]
+)
+YUV_TO_RGB = np.array([[1.0, 0.0, 1.402], [1.0, -0.34414, -0.71414], [1.0, 1.772, 0.0]])
+YUV_OFFSET = np.array([0, 128.0, 128.0]).reshape(1, 1, -1)
+MAX_VAL = 255.0
+def rgb2yuv(im):
+    return np.tensordot(im, RGB_TO_YUV, ([2], [1])) + YUV_OFFSET
+def yuv2rgb(im):
+    return np.tensordot(im.astype(float) - YUV_OFFSET, YUV_TO_RGB, ([2], [1]))
+def get_valid_idx(valid, candidates):
+    """Find which values are present in a list and where they are located"""
+    locs = np.searchsorted(valid, candidates)
+    # Handle edge case where the candidate is larger than all valid values
+    locs = np.clip(locs, 0, len(valid) - 1)
+    # Identify which values are actually present
+    valid_idx = np.flatnonzero(valid[locs] == candidates)
+    locs = locs[valid_idx]
+    return valid_idx, locs
+class BilateralGrid(object):
+    def __init__(self, im, sigma_spatial=32, sigma_luma=8, sigma_chroma=8):
+        im_yuv = rgb2yuv(im)
+        # Compute 5-dimensional XYLUV bilateral-space coordinates
+        Iy, Ix = np.mgrid[: im.shape[0], : im.shape[1]]
+        x_coords = (Ix / sigma_spatial).astype(int)
+        y_coords = (Iy / sigma_spatial).astype(int)
+        luma_coords = (im_yuv[..., 0] / sigma_luma).astype(int)
+        chroma_coords = (im_yuv[..., 1:] / sigma_chroma).astype(int)
+        coords = np.dstack((x_coords, y_coords, luma_coords, chroma_coords))
+        coords_flat = coords.reshape(-1, coords.shape[-1])
+        self.npixels, self.dim = coords_flat.shape
+        # Hacky "hash vector" for coordinates,
+        # Requires all scaled coordinates be < MAX_VAL
+        self.hash_vec = MAX_VAL ** np.arange(self.dim)
+        # Construct S and B matrix
+        self._compute_factorization(coords_flat)
+    def _compute_factorization(self, coords_flat):
+        # Hash each coordinate in grid to a unique value
+        hashed_coords = self._hash_coords(coords_flat)
+        unique_hashes, unique_idx, idx = np.unique(
+            hashed_coords, return_index=True, return_inverse=True
+        )
+        # Identify unique set of vertices
+        unique_coords = coords_flat[unique_idx]
+        self.nvertices = len(unique_coords)
+        # Construct sparse splat matrix that maps from pixels to vertices
+        self.S = csr_matrix((np.ones(self.npixels), (idx, np.arange(self.npixels))))
+        # Construct sparse blur matrices.
+        # Note that these represent [1 0 1] blurs, excluding the central element
+        self.blurs = []
+        for d in range(self.dim):
+            blur = 0.0
+            for offset in (-1, 1):
+                offset_vec = np.zeros((1, self.dim))
+                offset_vec[:, d] = offset
+                neighbor_hash = self._hash_coords(unique_coords + offset_vec)
+                valid_coord, idx = get_valid_idx(unique_hashes, neighbor_hash)
+                blur = blur + csr_matrix(
+                    (np.ones((len(valid_coord),)), (valid_coord, idx)),
+                    shape=(self.nvertices, self.nvertices),
+                )
+            self.blurs.append(blur)
+    def _hash_coords(self, coord):
+        """Hacky function to turn a coordinate into a unique value"""
+        return np.dot(coord.reshape(-1, self.dim), self.hash_vec)
+    def splat(self, x):
+        return self.S.dot(x)
+    def slice(self, y):
+        return self.S.T.dot(y)
+    def blur(self, x):
+        """Blur a bilateral-space vector with a 1 2 1 kernel in each dimension"""
+        assert x.shape[0] == self.nvertices
+        out = 2 * self.dim * x
+        for blur in self.blurs:
+            out = out + blur.dot(x)
+        return out
+    def filter(self, x):
+        """Apply bilateral filter to an input x"""
+        return self.slice(self.blur(self.splat(x))) / self.slice(
+            self.blur(self.splat(np.ones_like(x)))
+        )
+def bistochastize(grid, maxiter=10):
+    """Compute diagonal matrices to bistochastize a bilateral grid"""
+    m = grid.splat(np.ones(grid.npixels))
+    n = np.ones(grid.nvertices)
+    for i in range(maxiter):
+        n = np.sqrt(n * m / grid.blur(n))
+    # Correct m to satisfy the assumption of bistochastization regardless
+    # of how many iterations have been run.
+    m = n * grid.blur(n)
+    Dm = diags(m, 0)
+    Dn = diags(n, 0)
+    return Dn, Dm
+class BilateralSolver(object):
+    def __init__(self, grid, params):
+        self.grid = grid
+        self.params = params
+        self.Dn, self.Dm = bistochastize(grid)
+    def solve(self, x, w):
+        # Check that w is a vector or a nx1 matrix
+        if w.ndim == 2:
+            assert w.shape[1] == 1
+        elif w.dim == 1:
+            w = w.reshape(w.shape[0], 1)
+        A_smooth = self.Dm - self.Dn.dot(self.grid.blur(self.Dn))
+        w_splat = self.grid.splat(w)
+        A_data = diags(w_splat[:, 0], 0)
+        A = self.params["lam"] * A_smooth + A_data
+        xw = x * w
+        b = self.grid.splat(xw)
+        # Use simple Jacobi preconditioner
+        A_diag = np.maximum(A.diagonal(), self.params["A_diag_min"])
+        M = diags(1 / A_diag, 0)
+        # Flat initialization
+        y0 = self.grid.splat(xw) / w_splat
+        yhat = np.empty_like(y0)
+        for d in range(x.shape[-1]):
+            yhat[..., d], info = cg(
+                A,
+                b[..., d],
+                x0=y0[..., d],
+                M=M,
+                maxiter=self.params["cg_maxiter"],
+                tol=self.params["cg_tol"],
+            )
+        xhat = self.grid.slice(yhat)
+        return xhat
+def bilateral_solver_output(
+    img_pth,
+    target,
+    img=None,
+    sigma_spatial=24,
+    sigma_luma=4,
+    sigma_chroma=4,
+    get_all_cc=False,
+):
+    if img is None:
+        reference = np.array(Image.open(img_pth).convert("RGB"))
+    else:
+        reference = np.array(img)
+    h, w = target.shape
+    confidence = np.ones((h, w)) * 0.999
+    grid_params = {
+        "sigma_luma": sigma_luma,  # Brightness bandwidth
+        "sigma_chroma": sigma_chroma,  # Color bandwidth
+        "sigma_spatial": sigma_spatial,  # Spatial bandwidth
+    }
+    bs_params = {
+        "lam": 256,  # The strength of the smoothness parameter
+        "A_diag_min": 1e-5,  # Clamp the diagonal of the A diagonal in the Jacobi preconditioner.
+        "cg_tol": 1e-5,  # The tolerance on the convergence in PCG
+        "cg_maxiter": 25,  # The number of PCG iterations
+    }
+    grid = BilateralGrid(reference, **grid_params)
+    t = target.reshape(-1, 1).astype(np.double)
+    c = confidence.reshape(-1, 1).astype(np.double)
+    # output solver, which is a soft value
+    output_solver = BilateralSolver(grid, bs_params).solve(t, c).reshape((h, w))
+    binary_solver = ndimage.binary_fill_holes(output_solver > 0.5)
+    labeled, nr_objects = ndimage.label(binary_solver)
+    nb_pixel = [np.sum(labeled == i) for i in range(nr_objects + 1)]
+    pixel_order = np.argsort(nb_pixel)
+    if get_all_cc:
+        # Remove known bakground
+        pixel_descending_order = pixel_order[::-1]
+        # Get all CC expect biggest one, may consider it as background, try and change here
+        binary_solver = (
+            (labeled[None, :, :] == pixel_descending_order[1:, None, None])
+            .astype(int)
+            .sum(0)
+        )
+    else:
+        try:
+            binary_solver = labeled == pixel_order[-2]
+        except:
+            binary_solver = np.ones((h, w), dtype=bool)
+    return output_solver, binary_solver

bkg_seg.py ADDED Viewed

	@@ -0,0 +1,86 @@

+# Copyright 2022 - Valeo Comfort and Driving Assistance - Oriane Siméoni @ valeo.ai
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import torch
+import torch.nn.functional as F
+from typing import Tuple
+def compute_img_bkg_seg(
+    attentions,
+    feats,
+    featmap_dims,
+    th_bkg,
+    dim=64,
+    epsilon: float = 1e-10,
+    apply_weights: bool = True,
+) -> Tuple[torch.Tensor, float]:
+    """
+    inputs
+       - attentions [B, ]
+    """
+    w_featmap, h_featmap = featmap_dims
+    nb, nh, _ = attentions.shape[:3]
+    # we keep only the output patch attention
+    att = attentions[:, :, 0, 1:].reshape(nb, nh, -1)
+    att = att.reshape(nb, nh, w_featmap, h_featmap)
+    # -----------------------------------------------
+    # Inspired by CroW sparsity channel weighting of each head CroW, Kalantidis etal.
+    threshold = torch.mean(att.reshape(nb, -1), dim=1)  # Find threshold per image
+    Q = torch.sum(
+        att.reshape(nb, nh, w_featmap * h_featmap) > threshold[:, None, None], axis=2
+    ) / (w_featmap * h_featmap)
+    beta = torch.log(torch.sum(Q + epsilon, dim=1)[:, None] / (Q + epsilon))
+    # Weight features based on attention sparsity
+    descs = feats[
+        :,
+        1:,
+    ]
+    if apply_weights:
+        descs = (descs.reshape(nb, -1, nh, dim) * beta[:, None, :, None]).reshape(
+            nb, -1, nh * dim
+        )
+    else:
+        descs = (descs.reshape(nb, -1, nh, dim)).reshape(nb, -1, nh * dim)
+    # -----------------------------------------------
+    # Compute cosine-similarities
+    descs = F.normalize(descs, dim=-1, p=2)
+    cos_sim = torch.bmm(descs, descs.permute(0, 2, 1))
+    # -----------------------------------------------
+    # Find pixel with least amount of attention
+    if apply_weights:
+        att = att.reshape(nb, nh, w_featmap, h_featmap) * beta[:, :, None, None]
+    else:
+        att = att.reshape(nb, nh, w_featmap, h_featmap)
+    id_pixel_ref = torch.argmin(torch.sum(att, axis=1).reshape(nb, -1), dim=-1)
+    # -----------------------------------------------
+    # Mask of definitely background pixels: 1 on the background
+    cos_sim = cos_sim.reshape(nb, -1, w_featmap * h_featmap)
+    bkg_mask = (
+        cos_sim[torch.arange(cos_sim.size(0)), id_pixel_ref, :].reshape(
+            nb, w_featmap, h_featmap
+        )
+        > th_bkg
+    )  # mask to be used to remove background
+    return bkg_mask.float()

configs/peekaboo_DUTS-TR.yaml ADDED Viewed

	@@ -0,0 +1,32 @@

+model:
+  arch: vit_small
+  patch_size: 8
+  pre_training: dino
+peekaboo:
+  feats: "k"
+training:
+  dataset: DUTS-TR
+  dataset_set: null
+  # Hyper params
+  seed: 0
+  max_iter: 500
+  nb_epochs: 3
+  batch_size: 50
+  lr0: 5e-2
+  step_lr_size: 50
+  step_lr_gamma: 0.95
+  # Augmentations
+  crop_size: 224
+  scale_range: [0.1, 3.0]
+  photometric_aug: gaussian_blur
+  proba_photometric_aug: 0.5
+  cropping_strategy: random_scale
+evaluation:
+  type: saliency
+  datasets: [DUT-OMRON, ECSSD]
+  freq: 50

data/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

data/coco_20k_filenames.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

data/examples/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

data/examples/VOC_000030.jpg ADDED Viewed

data/examples/a.jpeg ADDED Viewed

data/examples/b.jpeg ADDED Viewed

data/examples/c.jpeg ADDED Viewed

data/examples/d.jpeg ADDED Viewed

data/examples/e.jpeg ADDED Viewed

data/weights/peekaboo_decoder_weights_niter250.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8621874b7459a940f2c584ef8d618c961eac407bc616ca7a76e3c90b745a61f7
+size 2795

data/weights/peekaboo_decoder_weights_niter500.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:889f87ee21ea17a828d6065e3e187521989da1e94ebecc0f5988aaacb2a0c40f
+size 2795

datasets/VOC.py ADDED Viewed

	@@ -0,0 +1,82 @@

+import os
+from typing import Optional, Tuple, Union, Dict, List
+import cv2
+from pycocotools.coco import COCO
+import numpy as np
+import torch
+import torchvision
+from PIL import Image, PngImagePlugin
+from torch.utils.data import Dataset
+from torchvision import transforms as T
+from torchvision.transforms import ColorJitter, RandomApply, RandomGrayscale
+from tqdm import tqdm
+VOCDetectionMetadataType = Dict[str, Dict[str, Union[str, Dict[str, str], List[str]]]]
+def get_voc_detection_gt(
+    metadata: VOCDetectionMetadataType, remove_hards: bool = False
+) -> Tuple[np.array, List[str]]:
+    objects = metadata["annotation"]["object"]
+    nb_obj = len(objects)
+    gt_bbxs = []
+    gt_clss = []
+    for object in range(nb_obj):
+        if remove_hards and (
+            objects[object]["truncated"] == "1" or objects[object]["difficult"] == "1"
+        ):
+            continue
+        gt_cls = objects[object]["name"]
+        gt_clss.append(gt_cls)
+        obj = objects[object]["bndbox"]
+        x1y1x2y2 = [
+            int(obj["xmin"]),
+            int(obj["ymin"]),
+            int(obj["xmax"]),
+            int(obj["ymax"]),
+        ]
+        # Original annotations are integers in the range [1, W or H]
+        # Assuming they mean 1-based pixel indices (inclusive),
+        # a box with annotation (xmin=1, xmax=W) covers the whole image.
+        # In coordinate space this is represented by (xmin=0, xmax=W)
+        x1y1x2y2[0] -= 1
+        x1y1x2y2[1] -= 1
+        gt_bbxs.append(x1y1x2y2)
+    return np.asarray(gt_bbxs), gt_clss
+def create_gt_masks_if_voc(labels: PngImagePlugin.PngImageFile) -> Image.Image:
+    mask = np.array(labels)
+    mask_gt = (mask > 0).astype(float)
+    mask_gt = np.where(mask_gt != 0.0, 255, mask_gt)
+    mask_gt = Image.fromarray(np.uint8(mask_gt))
+    return mask_gt
+def create_VOC_loader(img_dir, dataset_set, evaluation_type):
+    year = img_dir[-4:]
+    download = not os.path.exists(img_dir)
+    if evaluation_type == "uod":
+        loader = torchvision.datasets.VOCDetection(
+            img_dir,
+            year=year,
+            image_set=dataset_set,
+            transform=None,
+            download=download,
+        )
+    elif evaluation_type == "saliency":
+        loader = torchvision.datasets.VOCSegmentation(
+            img_dir,
+            year=year,
+            image_set=dataset_set,
+            transform=None,
+            download=download,
+        )
+    else:
+        raise ValueError(f"Not implemented for {evaluation_type}.")
+    return loader

datasets/__init__.py ADDED Viewed

File without changes

datasets/augmentations.py ADDED Viewed

	@@ -0,0 +1,70 @@

+"""
+Code borrowed from SelfMask: https://github.com/NoelShin/selfmask
+"""
+import numpy as np
+import torch
+from PIL import Image
+from typing import Optional, Tuple, Union
+from torchvision.transforms import ColorJitter, RandomApply, RandomGrayscale
+from datasets.utils import GaussianBlur
+from datasets.geometric_transforms import (
+    random_scale,
+    random_crop,
+    random_hflip,
+)
+def geometric_augmentations(
+    image: Image.Image,
+    random_scale_range: Optional[Tuple[float, float]] = None,
+    random_crop_size: Optional[int] = None,
+    random_hflip_p: Optional[float] = None,
+    mask: Optional[Union[Image.Image, np.ndarray, torch.Tensor]] = None,
+    ignore_index: Optional[int] = None,
+) -> Tuple[Image.Image, torch.Tensor]:
+    """Note. image and mask are assumed to be of base size, thus share a spatial shape."""
+    if random_scale_range is not None:
+        image, mask = random_scale(
+            image=image, random_scale_range=random_scale_range, mask=mask
+        )
+    if random_crop_size is not None:
+        crop_size = (random_crop_size, random_crop_size)
+        fill = tuple(np.array(image).mean(axis=(0, 1)).astype(np.uint8).tolist())
+        image, offset = random_crop(image=image, crop_size=crop_size, fill=fill)
+        if mask is not None:
+            assert ignore_index is not None
+            mask = random_crop(
+                image=mask, crop_size=crop_size, fill=ignore_index, offset=offset
+            )[0]
+    if random_hflip_p is not None:
+        image, mask = random_hflip(image=image, p=random_hflip_p, mask=mask)
+    return image, mask
+def photometric_augmentations(
+    image: Image.Image,
+    random_color_jitter: bool,
+    random_grayscale: bool,
+    random_gaussian_blur: bool,
+    proba_photometric_aug: float,
+) -> torch.Tensor:
+    if random_color_jitter:
+        color_jitter = ColorJitter(
+            brightness=0.8, contrast=0.8, saturation=0.8, hue=0.2
+        )
+        image = RandomApply([color_jitter], p=proba_photometric_aug)(image)
+    if random_grayscale:
+        image = RandomGrayscale(proba_photometric_aug)(image)
+    if random_gaussian_blur:
+        w, h = image.size
+        image = GaussianBlur(kernel_size=int((0.1 * min(w, h) // 2 * 2) + 1))(
+            image, proba_photometric_aug
+        )
+    return image

datasets/datasets.py ADDED Viewed

	@@ -0,0 +1,476 @@

+# Code for Peekaboo
+# Author: Hasib Zunair
+# Modified from https://github.com/NoelShin/selfmask
+"""
+Dataset functions for applying Normalized Cut.
+"""
+import os
+import glob
+import random
+from typing import Optional, Tuple, Union
+from pycocotools.coco import COCO
+import numpy as np
+import torch
+import torchvision
+from PIL import Image
+from torch.utils.data import Dataset
+from torchvision import transforms as T
+try:
+    from torchvision.transforms import InterpolationMode
+    BICUBIC = InterpolationMode.BICUBIC
+except ImportError:
+    BICUBIC = Image.BICUBIC
+from datasets.utils import unnormalize
+from datasets.geometric_transforms import resize
+from datasets.VOC import get_voc_detection_gt, create_gt_masks_if_voc, create_VOC_loader
+from datasets.augmentations import geometric_augmentations, photometric_augmentations
+from datasets.uod_datasets import UODDataset
+NORMALIZE = T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
+def set_dataset_dir(dataset_name, root_dir):
+    if dataset_name == "ECSSD":
+        dataset_dir = os.path.join(root_dir, "ECSSD")
+        img_dir = os.path.join(dataset_dir, "images")
+        gt_dir = os.path.join(dataset_dir, "ground_truth_mask")
+        scribbles_dir = os.path.join(root_dir, "SCRIBBLES")
+    elif dataset_name == "DUTS-TEST":
+        dataset_dir = os.path.join(root_dir, "DUTS-TE")
+        img_dir = os.path.join(dataset_dir, "DUTS-TE-Image")
+        gt_dir = os.path.join(dataset_dir, "DUTS-TE-Mask")
+        scribbles_dir = os.path.join(root_dir, "SCRIBBLES")
+    elif dataset_name == "DUTS-TR":
+        dataset_dir = os.path.join(root_dir, "DUTS-TR")
+        img_dir = os.path.join(dataset_dir, "DUTS-TR-Image")
+        gt_dir = os.path.join(dataset_dir, "DUTS-TR-Mask")
+        scribbles_dir = os.path.join(root_dir, "SCRIBBLES")
+    elif dataset_name == "DUT-OMRON":
+        dataset_dir = os.path.join(root_dir, "DUT-OMRON")
+        img_dir = os.path.join(dataset_dir, "DUT-OMRON-image")
+        gt_dir = os.path.join(dataset_dir, "pixelwiseGT-new-PNG")
+        scribbles_dir = os.path.join(root_dir, "SCRIBBLES")
+    elif dataset_name == "VOC07":
+        dataset_dir = os.path.join(root_dir, "VOC2007")
+        img_dir = dataset_dir
+        gt_dir = dataset_dir
+        scribbles_dir = os.path.join(root_dir, "SCRIBBLES")
+    elif dataset_name == "VOC12":
+        dataset_dir = os.path.join(root_dir, "VOC2012")
+        img_dir = dataset_dir
+        gt_dir = dataset_dir
+        scribbles_dir = os.path.join(root_dir, "SCRIBBLES")
+    elif dataset_name == "COCO17":
+        dataset_dir = os.path.join(root_dir, "COCO")
+        img_dir = dataset_dir
+        gt_dir = dataset_dir
+        scribbles_dir = os.path.join(root_dir, "SCRIBBLES")
+    elif dataset_name == "ImageNet":
+        dataset_dir = os.path.join(root_dir, "ImageNet")
+        img_dir = dataset_dir
+        gt_dir = dataset_dir
+    else:
+        raise ValueError(f"Unknown dataset {dataset_name}")
+    return img_dir, gt_dir, scribbles_dir
+def build_dataset(
+    root_dir: str,
+    dataset_name: str,
+    dataset_set: Optional[str] = None,
+    for_eval: bool = False,
+    config=None,
+    evaluation_type="saliency",  # uod,
+):
+    """
+    Build dataset
+    """
+    if evaluation_type == "saliency":
+        # training data loaded from here
+        img_dir, gt_dir, scribbles_dir = set_dataset_dir(dataset_name, root_dir)
+        dataset = PeekabooDataset(
+            name=dataset_name,
+            img_dir=img_dir,
+            gt_dir=gt_dir,
+            scribbles_dir=scribbles_dir,
+            dataset_set=dataset_set,
+            config=config,
+            for_eval=for_eval,
+            evaluation_type=evaluation_type,
+        )
+    elif evaluation_type == "uod":
+        assert dataset_name in ["VOC07", "VOC12", "COCO20k"]
+        dataset_set = "trainval" if dataset_name in ["VOC07", "VOC12"] else "train"
+        no_hards = False
+        dataset = UODDataset(
+            dataset_name,
+            dataset_set,
+            root_dir=root_dir,
+            remove_hards=no_hards,
+        )
+    return dataset
+class PeekabooDataset(Dataset):
+    def __init__(
+        self,
+        name: str,
+        img_dir: str,
+        gt_dir: str,
+        scribbles_dir: str,
+        dataset_set: Optional[str] = None,
+        config=None,
+        for_eval: bool = False,
+        evaluation_type: str = "saliency",
+    ) -> None:
+        """
+        Args:
+            root_dir (string): Directory with all the images.
+            transform (callable, optional): Optional transform to be applied
+                on a sample.
+        """
+        self.for_eval = for_eval
+        self.use_aug = not for_eval
+        self.evaluation_type = evaluation_type
+        assert evaluation_type in ["saliency"]
+        self.name = name
+        self.dataset_set = dataset_set
+        self.img_dir = img_dir
+        self.gt_dir = gt_dir
+        self.scribbles_dir = scribbles_dir
+        # if VOC dataset
+        self.loader = None
+        self.cocoGt = None
+        self.config = config
+        if "VOC" in self.name:
+            self.loader = create_VOC_loader(self.img_dir, dataset_set, evaluation_type)
+        # if ImageNet dataset
+        elif "ImageNet" in self.name:
+            self.loader = torchvision.datasets.ImageNet(
+                self.img_dir,
+                split=dataset_set,
+                transform=None,
+                target_transform=None,
+            )
+        elif "COCO" in self.name:
+            year = int("20" + self.name[-2:])
+            annFile = f"/datasets_local/COCO/annotations/instances_{dataset_set}{str(year)}.json"
+            self.cocoGt = COCO(annFile)
+            self.img_ids = list(sorted(self.cocoGt.getImgIds()))
+            self.img_dir = f"/datasets_local/COCO/images/{dataset_set}{str(year)}/"
+        # Transformations
+        if self.for_eval:
+            (
+                full_img_transform,
+                no_norm_full_img_transform,
+            ) = self.get_init_transformation(isVOC="VOC" in name)
+            self.full_img_transform = full_img_transform
+            self.no_norm_full_img_transform = no_norm_full_img_transform
+        # Images
+        self.list_images = None
+        self.list_scribbles = None
+        if not "VOC" in self.name and not "COCO" in self.name:
+            self.list_images = [
+                os.path.join(img_dir, i) for i in sorted(os.listdir(img_dir))
+            ]
+            # get path to scribbles, high masks are used, see https://github.com/hasibzunair/msl-recognition
+            self.list_scribbles = sorted(glob.glob(scribbles_dir + "/*.png"))[::-1][
+                :1000
+            ]  # For heavy masking [::-1]
+        self.ignore_index = -1
+        self.mean = NORMALIZE.mean
+        self.std = NORMALIZE.std
+        self.to_tensor_and_normalize = T.Compose([T.ToTensor(), NORMALIZE])
+        self.normalize = NORMALIZE
+        if config is not None and self.use_aug:
+            self._set_aug(config)
+    def get_init_transformation(self, isVOC: bool = False):
+        if isVOC:
+            t = T.Compose(
+                [T.PILToTensor(), T.ConvertImageDtype(torch.float), NORMALIZE]
+            )
+            t_nonorm = T.Compose([T.PILToTensor(), T.ConvertImageDtype(torch.float)])
+            return t, t_nonorm
+        else:
+            t = T.Compose([T.ToTensor(), NORMALIZE])
+            t_nonorm = T.Compose([T.ToTensor()])
+            return t, t_nonorm
+    def _set_aug(self, config):
+        """
+        Set augmentation based on config.
+        """
+        photometric_aug = config.training["photometric_aug"]
+        self.cropping_strategy = config.training["cropping_strategy"]
+        if self.cropping_strategy == "center_crop":
+            self.use_aug = False  # default strategy, not considered to be a data aug
+        self.scale_range = config.training["scale_range"]
+        self.crop_size = config.training["crop_size"]
+        self.center_crop_transforms = T.Compose(
+            [
+                T.CenterCrop((self.crop_size, self.crop_size)),
+                T.ToTensor(),
+            ]
+        )
+        self.center_crop_only_transforms = T.Compose(
+            [T.CenterCrop((self.crop_size, self.crop_size)), T.PILToTensor()]
+        )
+        self.proba_photometric_aug = config.training["proba_photometric_aug"]
+        self.random_color_jitter = False
+        self.random_grayscale = False
+        self.random_gaussian_blur = False
+        if photometric_aug == "color_jitter":
+            self.random_color_jitter = True
+        elif photometric_aug == "grayscale":
+            self.random_grayscale = True
+        elif photometric_aug == "gaussian_blur":
+            self.random_gaussian_blur = True
+    def _preprocess_data_aug(
+        self,
+        image: Image.Image,
+        mask: Image.Image,
+        ignore_index: Optional[int] = None,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        """Prepare data in a proper form for either training (data augmentation) or validation."""
+        # resize to base size
+        image = resize(
+            image,
+            size=self.crop_size,
+            edge="shorter",
+            interpolation="bilinear",
+        )
+        mask = resize(
+            mask,
+            size=self.crop_size,
+            edge="shorter",
+            interpolation="bilinear",
+        )
+        if not isinstance(mask, torch.Tensor):
+            mask: torch.Tensor = torch.tensor(np.array(mask))
+        random_scale_range = None
+        random_crop_size = None
+        random_hflip_p = None
+        if self.cropping_strategy == "random_scale":
+            random_scale_range = self.scale_range
+        elif self.cropping_strategy == "random_crop":
+            random_crop_size = self.crop_size
+        elif self.cropping_strategy == "random_hflip":
+            random_hflip_p = 0.5
+        elif self.cropping_strategy == "random_crop_and_hflip":
+            random_hflip_p = 0.5
+            random_crop_size = self.crop_size
+        if random_crop_size or random_hflip_p or random_scale_range:
+            image, mask = geometric_augmentations(
+                image=image,
+                mask=mask,
+                random_scale_range=random_scale_range,
+                random_crop_size=random_crop_size,
+                ignore_index=ignore_index,
+                random_hflip_p=random_hflip_p,
+            )
+        if random_scale_range:
+            # resize to (self.crop_size, self.crop_size)
+            image = resize(
+                image,
+                size=self.crop_size,
+                interpolation="bilinear",
+            )
+            mask = resize(
+                mask,
+                size=(self.crop_size, self.crop_size),
+                interpolation="bilinear",
+            )
+        image = photometric_augmentations(
+            image,
+            random_color_jitter=self.random_color_jitter,
+            random_grayscale=self.random_grayscale,
+            random_gaussian_blur=self.random_gaussian_blur,
+            proba_photometric_aug=self.proba_photometric_aug,
+        )
+        # to tensor + normalize image
+        image = self.to_tensor_and_normalize(image)
+        return image, mask
+    def __len__(self) -> int:
+        if "VOC" in self.name:
+            return len(self.loader)
+        elif "ImageNet" in self.name:
+            return len(self.loader)
+        elif "COCO" in self.name:
+            return len(self.img_ids)
+        return len(self.list_images)
+    def _apply_center_crop(
+        self, image: Image.Image, mask: Union[Image.Image, np.ndarray, torch.Tensor]
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        img_t = self.center_crop_transforms(image)
+        # need to normalize image
+        img_t = self.normalize(img_t)
+        mask_gt = self.center_crop_transforms(mask).squeeze()
+        return img_t, mask_gt
+    def _preprocess_scribble(self, img, img_size):
+        transform = T.Compose(
+            [
+                T.Resize(img_size, BICUBIC),
+                T.CenterCrop(img_size),
+                T.ToTensor(),
+            ]
+        )
+        return transform(img)
+    def __getitem__(self, idx, get_mask_gt=True):
+        if "VOC" in self.name:
+            img, gt_labels = self.loader[idx]
+            if self.evaluation_type == "uod":
+                gt_labels, _ = get_voc_detection_gt(gt_labels, remove_hards=False)
+            elif self.evaluation_type == "saliency":
+                mask_gt = create_gt_masks_if_voc(gt_labels)
+            img_path = self.loader.images[idx]
+        elif "ImageNet" in self.name:
+            img, _ = self.loader[idx]
+            img_path = self.loader.imgs[idx][0]
+            # empty mask since no gt mask, only class label
+            zeros = np.zeros(np.array(img).shape[:2])
+            mask_gt = Image.fromarray(zeros)
+        elif "COCO" in self.name:
+            img_id = self.img_ids[idx]
+            path = self.cocoGt.loadImgs(img_id)[0]["file_name"]
+            img = Image.open(os.path.join(self.img_dir, path)).convert("RGB")
+            _ = self.cocoGt.loadAnns(self.cocoGt.getAnnIds(id))
+            img_path = self.img_ids[idx]  # What matters most is the id for eval
+            # empty mask since no gt mask, only class label
+            zeros = np.zeros(np.array(img).shape[:2])
+            mask_gt = Image.fromarray(zeros)
+        # For all others
+        else:
+            img_path = self.list_images[idx]
+            scribble_path = self.list_scribbles[random.randint(0, 950)]
+            # read image
+            with open(img_path, "rb") as f:
+                img = Image.open(f)
+                img = img.convert("RGB")
+                im_name = img_path.split("/")[-1]
+                mask_gt = Image.open(
+                    os.path.join(self.gt_dir, im_name.replace(".jpg", ".png"))
+                ).convert("L")
+        if self.for_eval:
+            img_t = self.full_img_transform(img)
+            img_init = self.no_norm_full_img_transform(img)
+            if self.evaluation_type == "saliency":
+                mask_gt = torch.tensor(np.array(mask_gt)).squeeze()
+                mask_gt = np.array(mask_gt)
+                mask_gt = mask_gt == 255
+                mask_gt = torch.tensor(mask_gt)
+        else:
+            if self.use_aug:
+                img_t, mask_gt = self._preprocess_data_aug(
+                    image=img, mask=mask_gt, ignore_index=self.ignore_index
+                )
+                mask_gt = np.array(mask_gt)
+                mask_gt = mask_gt == 255
+                mask_gt = torch.tensor(mask_gt)
+            else:
+                # no data aug
+                img_t, mask_gt = self._apply_center_crop(image=img, mask=mask_gt)
+                gt_labels = self.center_crop_only_transforms(gt_labels).squeeze()
+                mask_gt = np.asarray(mask_gt, np.int64)
+                mask_gt = mask_gt == 1
+                mask_gt = torch.tensor(mask_gt)
+            img_init = unnormalize(img_t)
+        if not get_mask_gt:
+            mask_gt = None
+        if self.evaluation_type == "uod":
+            gt_labels = torch.tensor(gt_labels)
+            mask_gt = gt_labels
+        # read scribble
+        with open(scribble_path, "rb") as f:
+            scribble = Image.open(f).convert("P")
+            scribble = self._preprocess_scribble(scribble, img_t.shape[1])
+            scribble = (scribble > 0).float()  # threshold to [0,1]
+            scribble = torch.max(scribble) - scribble  # inverted scribble
+        # create masked input image with scribble when training
+        if not self.for_eval:
+            masked_img_t = img_t * scribble
+            masked_img_init = unnormalize(masked_img_t)
+        else:
+            masked_img_t = img_t
+            masked_img_init = img_init
+        # returns the
+        # image, masked image, scribble,
+        # un-normalized image, un-normalized masked image
+        # ground truth mask, image path
+        return (
+            img_t,
+            masked_img_t,
+            scribble,
+            img_init,
+            masked_img_init,
+            mask_gt,
+            img_path,
+        )
+    def fullimg_mode(self):
+        self.val_full_image = True
+    def training_mode(self):
+        self.val_full_image = False

datasets/geometric_transforms.py ADDED Viewed

	@@ -0,0 +1,160 @@

+"""
+Code adapted from SelfMask: https://github.com/NoelShin/selfmask
+"""
+from random import randint, random, uniform
+from typing import Optional, Tuple, Union
+import numpy as np
+import torch
+import torchvision.transforms.functional as TF
+from PIL import Image
+from torchvision.transforms.functional import InterpolationMode as IM
+def random_crop(
+    image: Union[Image.Image, np.ndarray, torch.Tensor],
+    crop_size: Tuple[int, int],  # (h, w)
+    fill: Union[int, Tuple[int, int, int]],  # an unsigned integer or RGB,
+    offset: Optional[Tuple[int, int]] = None,  # (top, left) coordinate of a crop
+):
+    assert type(crop_size) in (tuple, list) and len(crop_size) == 2
+    if isinstance(image, np.ndarray):
+        image = torch.tensor(image)
+        h, w = image.shape[-2:]
+    elif isinstance(image, Image.Image):
+        w, h = image.size
+    elif isinstance(image, torch.Tensor):
+        h, w = image.shape[-2:]
+    else:
+        raise TypeError(type(image))
+    pad_h, pad_w = max(crop_size[0] - h, 0), max(crop_size[1] - w, 0)
+    image = TF.pad(image, [0, 0, pad_w, pad_h], fill=fill, padding_mode="constant")
+    if isinstance(image, Image.Image):
+        w, h = image.size
+    else:
+        h, w = image.shape[-2:]
+    if offset is None:
+        offset = (randint(0, h - crop_size[0]), randint(0, w - crop_size[1]))
+    image = TF.crop(
+        image, top=offset[0], left=offset[1], height=crop_size[0], width=crop_size[1]
+    )
+    return image, offset
+def compute_size(
+    input_size: Tuple[int, int], output_size: int, edge: str  # h, w
+) -> Tuple[int, int]:
+    assert edge in ["shorter", "longer"]
+    h, w = input_size
+    if edge == "longer":
+        if w > h:
+            h = int(float(h) / w * output_size)
+            w = output_size
+        else:
+            w = int(float(w) / h * output_size)
+            h = output_size
+        assert w <= output_size and h <= output_size
+    else:
+        if w > h:
+            w = int(float(w) / h * output_size)
+            h = output_size
+        else:
+            h = int(float(h) / w * output_size)
+            w = output_size
+        assert w >= output_size and h >= output_size
+    return h, w
+def resize(
+    image: Union[Image.Image, np.ndarray, torch.Tensor],
+    size: Union[int, Tuple[int, int]],
+    interpolation: str,
+    edge: str = "both",
+) -> Union[Image.Image, torch.Tensor]:
+    """
+    :param image: an image to be resized
+    :param size: a resulting image size
+    :param interpolation: sampling mode. ["nearest", "bilinear", "bicubic"]
+    :param edge: Default: "both"
+    No-op if a size is given as a tuple (h, w).
+    If set to "both", resize both height and width to the specified size.
+    If set to "shorter", resize the shorter edge to the specified size keeping the aspect ratio.
+    If set to "longer", resize the longer edge to the specified size keeping the aspect ratio.
+    :return: a resized image
+    """
+    assert interpolation in ["nearest", "bilinear", "bicubic"], ValueError(
+        interpolation
+    )
+    assert edge in ["both", "shorter", "longer"], ValueError(edge)
+    interpolation = {
+        "nearest": IM.NEAREST,
+        "bilinear": IM.BILINEAR,
+        "bicubic": IM.BICUBIC,
+    }[interpolation]
+    if type(image) == torch.Tensor:
+        image = image.clone().detach()
+    elif type(image) == np.ndarray:
+        image = torch.from_numpy(image)
+    if type(size) is tuple:
+        if type(image) == torch.Tensor and len(image.shape) == 2:
+            image = TF.resize(
+                image.unsqueeze(dim=0), size=size, interpolation=interpolation
+            ).squeeze(dim=0)
+        else:
+            image = TF.resize(image, size=size, interpolation=interpolation)
+    else:
+        if edge == "both":
+            image = TF.resize(image, size=[size, size], interpolation=interpolation)
+        else:
+            if isinstance(image, Image.Image):
+                w, h = image.size
+            else:
+                h, w = image.shape[-2:]
+            rh, rw = compute_size(input_size=(h, w), output_size=size, edge=edge)
+            image = TF.resize(image, size=[rh, rw], interpolation=interpolation)
+    return image
+def random_scale(
+    image: Union[Image.Image, np.ndarray, torch.Tensor],
+    random_scale_range: Tuple[float, float],
+    mask: Optional[Union[Image.Image, np.ndarray, torch.Tensor]] = None,
+):
+    scale = uniform(*random_scale_range)
+    if isinstance(image, Image.Image):
+        w, h = image.size
+    else:
+        h, w = image.shape[-2:]
+    w_rs, h_rs = int(w * scale), int(h * scale)
+    image: Image.Image = resize(image, size=(h_rs, w_rs), interpolation="bilinear")
+    if mask is not None:
+        mask = resize(mask, size=(h_rs, w_rs), interpolation="nearest")
+    return image, mask
+def random_hflip(
+    image: Union[Image.Image, np.ndarray, torch.Tensor],
+    p: float,
+    mask: Optional[Union[np.ndarray, torch.Tensor]] = None,
+):
+    assert 0.0 <= p <= 1.0, ValueError(random_hflip)
+    # Return a random floating point number in the range [0.0, 1.0).
+    if random() > p:
+        image = TF.hflip(image)
+        if mask is not None:
+            mask = TF.hflip(mask)
+    return image, mask

datasets/uod_datasets.py ADDED Viewed

	@@ -0,0 +1,396 @@

+# Copyright 2021 - Valeo Comfort and Driving Assistance - Oriane Siméoni @ valeo.ai
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Code adapted from previous method LOST: https://github.com/valeoai/LOST
+"""
+import os
+import math
+import torch
+import json
+import torchvision
+import numpy as np
+import skimage.io
+from PIL import Image
+from tqdm import tqdm
+from torchvision import transforms as pth_transforms
+# Image transformation applied to all images
+transform = pth_transforms.Compose(
+    [
+        pth_transforms.ToTensor(),
+        pth_transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
+    ]
+)
+class ImageDataset:
+    def __init__(self, image_path):
+        self.image_path = image_path
+        self.name = image_path.split("/")[-1]
+        # Read the image
+        with open(image_path, "rb") as f:
+            img = Image.open(f)
+            img = img.convert("RGB")
+        # Build a dataloader
+        img = transform(img)
+        self.dataloader = [[img, image_path]]
+    def get_image_name(self, *args, **kwargs):
+        return self.image_path.split("/")[-1].split(".")[0]
+    def load_image(self, *args, **kwargs):
+        return skimage.io.imread(self.image_path)
+class UODDataset:
+    def __init__(
+        self,
+        dataset_name,
+        dataset_set,
+        root_dir,
+        remove_hards: bool = False,
+    ):
+        """
+        Build the dataloader
+        """
+        self.dataset_name = dataset_name
+        self.set = dataset_set
+        self.root_dir = root_dir
+        if dataset_name == "VOC07":
+            self.root_path = f"{root_dir}/VOC2007"
+            self.year = "2007"
+        elif dataset_name == "VOC12":
+            self.root_path = f"{root_dir}/VOC2012"
+            self.year = "2012"
+        elif dataset_name == "COCO20k":
+            self.year = "2014"
+            self.root_path = f"{root_dir}/COCO/images/{dataset_set}{self.year}"
+            self.sel20k = "data/coco_20k_filenames.txt"
+            # new JSON file constructed based on COCO train2014 gt
+            self.all_annfile = f"{root_dir}/COCO/annotations/instances_train2014.json"
+            self.annfile = (
+                f"{root_dir}/COCO/annotations/instances_train2014_sel20k.json"
+            )
+            if not os.path.exists(self.annfile):
+                select_coco_20k(self.sel20k, self.all_annfile)
+        else:
+            raise ValueError("Unknown dataset.")
+        if not os.path.exists(self.root_path):
+            raise ValueError("Please follow the README to setup the datasets.")
+        self.name = f"{self.dataset_name}_{self.set}"
+        # Build the dataloader
+        # import pdb; pdb.set_trace()
+        if "VOC" in dataset_name:
+            self.dataloader = torchvision.datasets.VOCDetection(
+                self.root_path,
+                year=self.year,
+                image_set=self.set,
+                transform=transform,
+                download=False,
+            )
+        elif "COCO20k" == dataset_name:
+            self.dataloader = torchvision.datasets.CocoDetection(
+                self.root_path, annFile=self.annfile, transform=transform
+            )
+        else:
+            raise ValueError("Unknown dataset.")
+        # Set hards images that are not included
+        self.remove_hards = remove_hards
+        self.hards = []
+        if remove_hards:
+            self.name += f"-nohards"
+            self.hards = self.get_hards()
+            print(f"Nb images discarded {len(self.hards)}")
+    def __len__(self) -> int:
+        return len(self.dataloader)
+    def load_image(self, im_name):
+        """
+        Load the image corresponding to the im_name
+        """
+        if "VOC" in self.dataset_name:
+            image = skimage.io.imread(
+                f"{self.root_dir}/VOC{self.year}/JPEGImages/{im_name}"
+            )
+        elif "COCO" in self.dataset_name:
+            im_path = self.path_20k[self.sel_20k.index(im_name)]
+            image = skimage.io.imread(f"{self.root_dir}/COCO/images/{im_path}")
+        else:
+            raise ValueError("Unkown dataset.")
+        return image
+    def get_image_name(self, inp):
+        """
+        Return the image name
+        """
+        if "VOC" in self.dataset_name:
+            im_name = inp["annotation"]["filename"]
+        elif "COCO" in self.dataset_name:
+            im_name = str(inp[0]["image_id"])
+        return im_name
+    def extract_gt(self, targets, im_name):
+        if "VOC" in self.dataset_name:
+            return extract_gt_VOC(targets, remove_hards=self.remove_hards)
+        elif "COCO" in self.dataset_name:
+            return extract_gt_COCO(targets, remove_iscrowd=True)
+        else:
+            raise ValueError("Unknown dataset")
+    def extract_classes(self):
+        if "VOC" in self.dataset_name:
+            cls_path = f"classes_{self.set}_{self.year}.txt"
+        elif "COCO" in self.dataset_name:
+            cls_path = f"classes_{self.dataset}_{self.set}_{self.year}.txt"
+        # Load if exists
+        if os.path.exists(cls_path):
+            all_classes = []
+            with open(cls_path, "r") as f:
+                for line in f:
+                    all_classes.append(line.strip())
+        else:
+            print("Extract all classes from the dataset")
+            if "VOC" in self.dataset_name:
+                all_classes = self.extract_classes_VOC()
+            elif "COCO" in self.dataset_name:
+                all_classes = self.extract_classes_COCO()
+            with open(cls_path, "w") as f:
+                for s in all_classes:
+                    f.write(str(s) + "\n")
+        return all_classes
+    def extract_classes_VOC(self):
+        all_classes = []
+        for im_id, inp in enumerate(tqdm(self.dataloader)):
+            objects = inp[1]["annotation"]["object"]
+            for o in range(len(objects)):
+                if objects[o]["name"] not in all_classes:
+                    all_classes.append(objects[o]["name"])
+        return all_classes
+    def extract_classes_COCO(self):
+        all_classes = []
+        for im_id, inp in enumerate(tqdm(self.dataloader)):
+            objects = inp[1]
+            for o in range(len(objects)):
+                if objects[o]["category_id"] not in all_classes:
+                    all_classes.append(objects[o]["category_id"])
+        return all_classes
+    def get_hards(self):
+        hard_path = "datasets/hard_%s_%s_%s.txt" % (
+            self.dataset_name,
+            self.set,
+            self.year,
+        )
+        if os.path.exists(hard_path):
+            hards = []
+            with open(hard_path, "r") as f:
+                for line in f:
+                    hards.append(int(line.strip()))
+        else:
+            print("Discover hard images that should be discarded")
+            if "VOC" in self.dataset_name:
+                # set the hards
+                hards = discard_hard_voc(self.dataloader)
+            with open(hard_path, "w") as f:
+                for s in hards:
+                    f.write(str(s) + "\n")
+        return hards
+def discard_hard_voc(dataloader):
+    hards = []
+    for im_id, inp in enumerate(tqdm(dataloader)):
+        objects = inp[1]["annotation"]["object"]
+        nb_obj = len(objects)
+        hard = np.zeros(nb_obj)
+        for i, o in enumerate(range(nb_obj)):
+            hard[i] = (
+                1
+                if (objects[o]["truncated"] == "1" or objects[o]["difficult"] == "1")
+                else 0
+            )
+        # all images with only truncated or difficult objects
+        if np.sum(hard) == nb_obj:
+            hards.append(im_id)
+    return hards
+def extract_gt_COCO(targets, remove_iscrowd=True):
+    objects = targets
+    nb_obj = len(objects)
+    gt_bbxs = []
+    gt_clss = []
+    for o in range(nb_obj):
+        # Remove iscrowd boxes
+        if remove_iscrowd and objects[o]["iscrowd"] == 1:
+            continue
+        gt_cls = objects[o]["category_id"]
+        gt_clss.append(gt_cls)
+        bbx = objects[o]["bbox"]
+        x1y1x2y2 = [bbx[0], bbx[1], bbx[0] + bbx[2], bbx[1] + bbx[3]]
+        x1y1x2y2 = [int(round(x)) for x in x1y1x2y2]
+        gt_bbxs.append(x1y1x2y2)
+    return np.asarray(gt_bbxs), gt_clss
+def extract_gt_VOC(targets, remove_hards=False):
+    objects = targets["annotation"]["object"]
+    nb_obj = len(objects)
+    gt_bbxs = []
+    gt_clss = []
+    for o in range(nb_obj):
+        if remove_hards and (
+            objects[o]["truncated"] == "1" or objects[o]["difficult"] == "1"
+        ):
+            continue
+        gt_cls = objects[o]["name"]
+        gt_clss.append(gt_cls)
+        obj = objects[o]["bndbox"]
+        x1y1x2y2 = [
+            int(obj["xmin"]),
+            int(obj["ymin"]),
+            int(obj["xmax"]),
+            int(obj["ymax"]),
+        ]
+        # Original annotations are integers in the range [1, W or H]
+        # Assuming they mean 1-based pixel indices (inclusive),
+        # a box with annotation (xmin=1, xmax=W) covers the whole image.
+        # In coordinate space this is represented by (xmin=0, xmax=W)
+        x1y1x2y2[0] -= 1
+        x1y1x2y2[1] -= 1
+        gt_bbxs.append(x1y1x2y2)
+    return np.asarray(gt_bbxs), gt_clss
+def bbox_iou(box1, box2, x1y1x2y2=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-7):
+    # https://github.com/ultralytics/yolov5/blob/develop/utils/general.py
+    # Returns the IoU of box1 to box2. box1 is 4, box2 is nx4
+    box2 = box2.T
+    # Get the coordinates of bounding boxes
+    if x1y1x2y2:  # x1, y1, x2, y2 = box1
+        b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
+        b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
+    else:  # transform from xywh to xyxy
+        b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2
+        b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
+        b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2
+        b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2
+    # Intersection area
+    inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * (
+        torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)
+    ).clamp(0)
+    # Union Area
+    w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
+    w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
+    union = w1 * h1 + w2 * h2 - inter + eps
+    iou = inter / union
+    if GIoU or DIoU or CIoU:
+        cw = torch.max(b1_x2, b2_x2) - torch.min(
+            b1_x1, b2_x1
+        )  # convex (smallest enclosing box) width
+        ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1)  # convex height
+        if CIoU or DIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
+            c2 = cw**2 + ch**2 + eps  # convex diagonal squared
+            rho2 = (
+                (b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2
+                + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2
+            ) / 4  # center distance squared
+            if DIoU:
+                return iou - rho2 / c2  # DIoU
+            elif (
+                CIoU
+            ):  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
+                v = (4 / math.pi**2) * torch.pow(
+                    torch.atan(w2 / h2) - torch.atan(w1 / h1), 2
+                )
+                with torch.no_grad():
+                    alpha = v / (v - iou + (1 + eps))
+                return iou - (rho2 / c2 + v * alpha)  # CIoU
+        else:  # GIoU https://arxiv.org/pdf/1902.09630.pdf
+            c_area = cw * ch + eps  # convex area
+            return iou - (c_area - union) / c_area  # GIoU
+    else:
+        return iou  # IoU
+def select_coco_20k(sel_file, all_annotations_file):
+    print("Building COCO 20k dataset.")
+    # load all annotations
+    with open(all_annotations_file, "r") as f:
+        train2014 = json.load(f)
+    # load selected images
+    with open(sel_file, "r") as f:
+        sel_20k = f.readlines()
+        sel_20k = [s.replace("\n", "") for s in sel_20k]
+    im20k = [str(int(s.split("_")[-1].split(".")[0])) for s in sel_20k]
+    new_anno = []
+    new_images = []
+    for i in tqdm(im20k):
+        new_anno.extend(
+            [a for a in train2014["annotations"] if a["image_id"] == int(i)]
+        )
+        new_images.extend([a for a in train2014["images"] if a["id"] == int(i)])
+    train2014_20k = {}
+    train2014_20k["images"] = new_images
+    train2014_20k["annotations"] = new_anno
+    train2014_20k["categories"] = train2014["categories"]
+    with open(
+        "datasets_local/COCO/annotations/instances_train2014_sel20k.json", "w"
+    ) as outfile:
+        json.dump(train2014_20k, outfile)
+    print("Done.")

datasets/utils.py ADDED Viewed

	@@ -0,0 +1,45 @@

+import numpy as np
+import torch
+from PIL import Image
+from torchvision import transforms as T
+NORMALIZE = T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
+class GaussianBlur:
+    """
+    Code borrowed from SelfMask: https://github.com/NoelShin/selfmask
+    """
+    # Implements Gaussian blur as described in the SimCLR paper
+    def __init__(self, kernel_size: float, min: float = 0.1, max: float = 2.0) -> None:
+        self.min = min
+        self.max = max
+        # kernel size is set to be 10% of the image height/width
+        self.kernel_size = kernel_size
+    def __call__(self, sample: Image.Image, random_gaussian_blur_p: float):
+        sample = np.array(sample)
+        # blur the image with a 50% chance
+        prob = np.random.random_sample()
+        if prob < 0.5:
+            import cv2
+            sigma = (self.max - self.min) * np.random.random_sample() + self.min
+            sample = cv2.GaussianBlur(
+                sample, (self.kernel_size, self.kernel_size), sigma
+            )
+        return sample
+def unnormalize(image, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)):
+    """
+    Code borrowed from STEGO: https://github.com/mhamilton723/STEGO
+    """
+    image2 = torch.clone(image)
+    for t, m, s in zip(image2, mean, std):
+        t.mul_(s).add_(m)
+    return image2

demo.py ADDED Viewed

	@@ -0,0 +1,118 @@

+# Code for Peekaboo
+# Author: Hasib Zunair
+# Modified from https://github.com/valeoai/FOUND, see license below.
+# Copyright 2022 - Valeo Comfort and Driving Assistance - Oriane Siméoni @ valeo.ai
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Visualize model predictions"""
+import os
+import torch
+import argparse
+import torch.nn as nn
+import torch.nn.functional as F
+import matplotlib.pyplot as plt
+from PIL import Image
+from model import PeekabooModel
+from misc import load_config
+from torchvision import transforms as T
+NORMALIZE = T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Evaluation of Peekaboo",
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+    )
+    parser.add_argument(
+        "--img-path",
+        type=str,
+        default="data/examples/VOC_000030.jpg",
+        help="Image path.",
+    )
+    parser.add_argument(
+        "--model-weights",
+        type=str,
+        default="data/weights/peekaboo_decoder_weights_niter500.pt",
+    )
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="configs/peekaboo_DUTS-TR.yaml",
+    )
+    parser.add_argument(
+        "--output-dir",
+        type=str,
+        default="outputs",
+    )
+    args = parser.parse_args()
+    # Saving dir
+    if not os.path.exists(args.output_dir):
+        os.makedirs(args.output_dir)
+    # Configuration
+    config, _ = load_config(args.config)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    # Load the model
+    model = PeekabooModel(
+        vit_model=config.model["pre_training"],
+        vit_arch=config.model["arch"],
+        vit_patch_size=config.model["patch_size"],
+        enc_type_feats=config.peekaboo["feats"],
+    )
+    # Load weights
+    model.decoder_load_weights(args.model_weights)
+    model.eval()
+    print(f"Model {args.model_weights} loaded correctly.")
+    # Load the image
+    with open(args.img_path, "rb") as f:
+        img = Image.open(f)
+        img = img.convert("RGB")
+        t = T.Compose([T.ToTensor(), NORMALIZE])
+        img_t = t(img)[None, :, :, :]
+        inputs = img_t.to(device)
+    # Forward step
+    with torch.no_grad():
+        preds = model(inputs, for_eval=True)
+    sigmoid = nn.Sigmoid()
+    h, w = img_t.shape[-2:]
+    preds_up = F.interpolate(
+        preds, scale_factor=model.vit_patch_size, mode="bilinear", align_corners=False
+    )[..., :h, :w]
+    preds_up = (sigmoid(preds_up.detach()) > 0.5).squeeze(0).float()
+    plt.figure()
+    plt.imshow(img)
+    plt.imshow(
+        preds_up.cpu().squeeze().numpy(), "gray", interpolation="none", alpha=0.5
+    )
+    plt.axis("off")
+    img_name = args.img_path
+    img_name = img_name.split("/")[-1].split(".")[0]
+    plt.savefig(
+        os.path.join(args.output_dir, f"{img_name}-peekaboo.png"),
+        bbox_inches="tight",
+        pad_inches=0,
+    )
+    plt.close()
+    print(f"Saved model prediction.")

dino ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit 7c446df5b9f45747937fb0d72314eb9f7b66930a

environment.yml ADDED Viewed

	@@ -0,0 +1,575 @@

+# Environment used for this work
+name: peekaboo
+channels:
+  - defaults
+  - conda-forge
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - abseil-cpp=20211102.0=hd4dd3e8_0
+  - aiobotocore=2.5.0=py38h06a4308_0
+  - aiofiles=22.1.0=py38h06a4308_0
+  - aiohttp=3.8.5=py38h5eee18b_0
+  - aioitertools=0.7.1=pyhd3eb1b0_0
+  - aiosignal=1.2.0=pyhd3eb1b0_0
+  - aiosqlite=0.18.0=py38h06a4308_0
+  - alabaster=0.7.12=pyhd3eb1b0_0
+  - anaconda=2023.09=py38_mkl_1
+  - aom=3.6.0=h6a678d5_0
+  - appdirs=1.4.4=pyhd3eb1b0_0
+  - argon2-cffi=21.3.0=pyhd3eb1b0_0
+  - argon2-cffi-bindings=21.2.0=py38h7f8727e_0
+  - arrow=1.2.3=py38h06a4308_1
+  - arrow-cpp=11.0.0=h374c478_2
+  - astroid=2.14.2=py38h06a4308_0
+  - astropy=5.1=py38h7deecbd_0
+  - asttokens=2.0.5=pyhd3eb1b0_0
+  - async-timeout=4.0.2=py38h06a4308_0
+  - atomicwrites=1.4.0=py_0
+  - attrs=22.1.0=py38h06a4308_0
+  - automat=20.2.0=py_0
+  - autopep8=1.6.0=pyhd3eb1b0_1
+  - aws-c-common=0.6.8=h5eee18b_1
+  - aws-c-event-stream=0.1.6=h6a678d5_6
+  - aws-checksums=0.1.11=h5eee18b_2
+  - aws-sdk-cpp=1.8.185=h721c034_1
+  - babel=2.11.0=py38h06a4308_0
+  - backcall=0.2.0=pyhd3eb1b0_0
+  - bcrypt=3.2.0=py38h5eee18b_1
+  - beautifulsoup4=4.12.2=py38h06a4308_0
+  - binaryornot=0.4.4=pyhd3eb1b0_1
+  - blas=1.0=mkl
+  - bleach=4.1.0=pyhd3eb1b0_0
+  - blosc=1.21.3=h6a678d5_0
+  - bokeh=2.4.3=py38h06a4308_0
+  - boost-cpp=1.73.0=h7f8727e_12
+  - botocore=1.29.76=py38h06a4308_0
+  - bottleneck=1.3.5=py38h7deecbd_0
+  - brotli=1.0.9=h5eee18b_7
+  - brotli-bin=1.0.9=h5eee18b_7
+  - brotlipy=0.7.0=py38h27cfd23_1003
+  - brunsli=0.1=h2531618_0
+  - bzip2=1.0.8=h7b6447c_0
+  - c-ares=1.19.1=h5eee18b_0
+  - c-blosc2=2.8.0=h6a678d5_0
+  - ca-certificates=2023.08.22=h06a4308_0
+  - certifi=2023.7.22=py38h06a4308_0
+  - cffi=1.15.1=py38h5eee18b_3
+  - cfitsio=3.470=h5893167_7
+  - chardet=4.0.0=py38h06a4308_1003
+  - charls=2.2.0=h2531618_0
+  - charset-normalizer=2.0.4=pyhd3eb1b0_0
+  - click=8.0.4=py38h06a4308_0
+  - cloudpickle=2.2.1=py38h06a4308_0
+  - colorama=0.4.6=py38h06a4308_0
+  - colorcet=3.0.1=py38h06a4308_0
+  - comm=0.1.2=py38h06a4308_0
+  - constantly=15.1.0=pyh2b92418_0
+  - contourpy=1.0.5=py38hdb19cb5_0
+  - cookiecutter=1.7.3=pyhd3eb1b0_0
+  - cryptography=41.0.3=py38hdda0065_0
+  - cssselect=1.1.0=pyhd3eb1b0_0
+  - curl=8.2.1=hdbd6064_0
+  - cyrus-sasl=2.1.28=h52b45da_1
+  - cytoolz=0.12.0=py38h5eee18b_0
+  - daal4py=2023.1.1=py38h79cecc1_0
+  - dal=2023.1.1=hdb19cb5_48679
+  - dask=2023.4.1=py38h06a4308_1
+  - dask-core=2023.4.1=py38h06a4308_0
+  - datasets=2.12.0=py38h06a4308_0
+  - datashader=0.15.2=py38h06a4308_0
+  - datashape=0.5.4=py38h06a4308_1
+  - dav1d=1.2.1=h5eee18b_0
+  - dbus=1.13.18=hb2f20db_0
+  - debugpy=1.6.7=py38h6a678d5_0
+  - decorator=5.1.1=pyhd3eb1b0_0
+  - defusedxml=0.7.1=pyhd3eb1b0_0
+  - diff-match-patch=20200713=pyhd3eb1b0_0
+  - dill=0.3.6=py38h06a4308_0
+  - distributed=2023.4.1=py38h06a4308_1
+  - docstring-to-markdown=0.11=py38h06a4308_0
+  - docutils=0.18.1=py38h06a4308_3
+  - entrypoints=0.4=py38h06a4308_0
+  - et_xmlfile=1.1.0=py38h06a4308_0
+  - exceptiongroup=1.0.4=py38h06a4308_0
+  - executing=0.8.3=pyhd3eb1b0_0
+  - expat=2.5.0=h6a678d5_0
+  - filelock=3.9.0=py38h06a4308_0
+  - flake8=6.0.0=py38h06a4308_0
+  - flask=2.2.2=py38h06a4308_0
+  - font-ttf-dejavu-sans-mono=2.37=hd3eb1b0_0
+  - font-ttf-inconsolata=2.001=hcb22688_0
+  - font-ttf-source-code-pro=2.030=hd3eb1b0_0
+  - font-ttf-ubuntu=0.83=h8b1ccd4_0
+  - fontconfig=2.14.1=h4c34cd2_2
+  - fonts-anaconda=1=h8fa9717_0
+  - fonttools=4.25.0=pyhd3eb1b0_0
+  - freetype=2.12.1=h4a9f257_0
+  - frozenlist=1.3.3=py38h5eee18b_0
+  - fsspec=2023.4.0=py38h06a4308_0
+  - gensim=4.3.0=py38h6a678d5_0
+  - gflags=2.2.2=he6710b0_0
+  - giflib=5.2.1=h5eee18b_3
+  - glib=2.69.1=he621ea3_2
+  - glog=0.5.0=h2531618_0
+  - gmp=6.2.1=h295c915_3
+  - gmpy2=2.1.2=py38heeb90bb_0
+  - greenlet=2.0.1=py38h6a678d5_0
+  - grpc-cpp=1.48.2=he1ff14a_1
+  - gst-plugins-base=1.14.1=h6a678d5_1
+  - gstreamer=1.14.1=h5eee18b_1
+  - h5py=3.9.0=py38he06866b_0
+  - hdf5=1.12.1=h2b7332f_3
+  - heapdict=1.0.1=pyhd3eb1b0_0
+  - holoviews=1.17.1=py38h06a4308_0
+  - huggingface_hub=0.15.1=py38h06a4308_0
+  - hvplot=0.8.4=py38h06a4308_0
+  - hyperlink=21.0.0=pyhd3eb1b0_0
+  - icu=58.2=he6710b0_3
+  - imagecodecs=2023.1.23=py38hc4b7b5f_0
+  - imageio=2.31.1=py38h06a4308_0
+  - imagesize=1.4.1=py38h06a4308_0
+  - imbalanced-learn=0.10.1=py38h06a4308_1
+  - importlib-metadata=6.0.0=py38h06a4308_0
+  - importlib_metadata=6.0.0=hd3eb1b0_0
+  - importlib_resources=5.2.0=pyhd3eb1b0_1
+  - incremental=21.3.0=pyhd3eb1b0_0
+  - inflection=0.5.1=py38h06a4308_0
+  - iniconfig=1.1.1=pyhd3eb1b0_0
+  - intake=0.6.8=py38h06a4308_0
+  - intel-openmp=2023.1.0=hdb19cb5_46305
+  - intervaltree=3.1.0=pyhd3eb1b0_0
+  - ipykernel=6.25.0=py38h2f386ee_0
+  - ipython=8.12.2=py38h06a4308_0
+  - ipython_genutils=0.2.0=pyhd3eb1b0_1
+  - ipywidgets=8.0.4=py38h06a4308_0
+  - isort=5.9.3=pyhd3eb1b0_0
+  - itemadapter=0.3.0=pyhd3eb1b0_0
+  - itemloaders=1.0.4=pyhd3eb1b0_1
+  - itsdangerous=2.0.1=pyhd3eb1b0_0
+  - jaraco.classes=3.2.1=pyhd3eb1b0_0
+  - jedi=0.18.1=py38h06a4308_1
+  - jeepney=0.7.1=pyhd3eb1b0_0
+  - jellyfish=1.0.1=py38hb02cf49_0
+  - jinja2=3.1.2=py38h06a4308_0
+  - jinja2-time=0.2.0=pyhd3eb1b0_3
+  - jmespath=0.10.0=pyhd3eb1b0_0
+  - joblib=1.2.0=py38h06a4308_0
+  - jpeg=9e=h5eee18b_1
+  - jq=1.6=h27cfd23_1000
+  - json5=0.9.6=pyhd3eb1b0_0
+  - jsonschema=4.17.3=py38h06a4308_0
+  - jupyter=1.0.0=py38h06a4308_8
+  - jupyter_client=7.4.9=py38h06a4308_0
+  - jupyter_console=6.6.3=py38h06a4308_0
+  - jupyter_core=5.3.0=py38h06a4308_0
+  - jupyter_events=0.6.3=py38h06a4308_0
+  - jupyter_server=1.23.4=py38h06a4308_0
+  - jupyter_server_fileid=0.9.0=py38h06a4308_0
+  - jupyter_server_ydoc=0.8.0=py38h06a4308_1
+  - jupyter_ydoc=0.2.4=py38h06a4308_0
+  - jupyterlab=3.6.3=py38h06a4308_0
+  - jupyterlab_pygments=0.1.2=py_0
+  - jupyterlab_server=2.22.0=py38h06a4308_0
+  - jupyterlab_widgets=3.0.5=py38h06a4308_0
+  - jxrlib=1.1=h7b6447c_2
+  - kaleido-core=0.2.1=h7c8854e_0
+  - keyring=23.13.1=py38h06a4308_0
+  - kiwisolver=1.4.4=py38h6a678d5_0
+  - krb5=1.20.1=h143b758_1
+  - lazy-object-proxy=1.6.0=py38h27cfd23_0
+  - lcms2=2.12=h3be6417_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - lerc=3.0=h295c915_0
+  - libaec=1.0.4=he6710b0_1
+  - libavif=0.11.1=h5eee18b_0
+  - libboost=1.73.0=h28710b8_12
+  - libbrotlicommon=1.0.9=h5eee18b_7
+  - libbrotlidec=1.0.9=h5eee18b_7
+  - libbrotlienc=1.0.9=h5eee18b_7
+  - libclang=14.0.6=default_hc6dbbc7_1
+  - libclang13=14.0.6=default_he11475f_1
+  - libcups=2.4.2=h2d74bed_1
+  - libcurl=8.2.1=h251f7ec_0
+  - libdeflate=1.17=h5eee18b_0
+  - libedit=3.1.20221030=h5eee18b_0
+  - libev=4.33=h7f8727e_1
+  - libevent=2.1.12=hdbd6064_1
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgfortran-ng=11.2.0=h00389a5_1
+  - libgfortran5=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libllvm14=14.0.6=hdb19cb5_3
+  - libnghttp2=1.52.0=h2d74bed_1
+  - libpng=1.6.39=h5eee18b_0
+  - libpq=12.15=hdbd6064_1
+  - libprotobuf=3.20.3=he621ea3_0
+  - libsodium=1.0.18=h7b6447c_0
+  - libspatialindex=1.9.3=h2531618_0
+  - libssh2=1.10.0=hdbd6064_2
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libthrift=0.15.0=h1795dd8_2
+  - libtiff=4.5.1=h6a678d5_0
+  - libuuid=1.41.5=h5eee18b_0
+  - libwebp=1.3.2=h11a3e52_0
+  - libwebp-base=1.3.2=h5eee18b_0
+  - libxcb=1.15=h7f8727e_0
+  - libxkbcommon=1.0.1=h5eee18b_1
+  - libxml2=2.10.4=hcbfbd50_0
+  - libxslt=1.1.37=h2085143_0
+  - libzopfli=1.0.3=he6710b0_0
+  - llvmlite=0.40.0=py38he621ea3_0
+  - locket=1.0.0=py38h06a4308_0
+  - lxml=4.9.3=py38hdbbb534_0
+  - lz4-c=1.9.4=h6a678d5_0
+  - lzo=2.10=h7b6447c_2
+  - markdown=3.4.1=py38h06a4308_0
+  - markupsafe=2.1.1=py38h7f8727e_0
+  - mathjax=2.7.5=h06a4308_0
+  - matplotlib=3.7.2=py38h06a4308_0
+  - matplotlib-base=3.7.2=py38h1128e8f_0
+  - matplotlib-inline=0.1.6=py38h06a4308_0
+  - mccabe=0.7.0=pyhd3eb1b0_0
+  - mistune=0.8.4=py38h7b6447c_1000
+  - mkl=2023.1.0=h213fc3f_46343
+  - mkl-service=2.4.0=py38h5eee18b_1
+  - mkl_fft=1.3.8=py38h5eee18b_0
+  - mkl_random=1.2.4=py38hdb19cb5_0
+  - more-itertools=8.12.0=pyhd3eb1b0_0
+  - mpc=1.1.0=h10f8cd9_1
+  - mpfr=4.0.2=hb69a4c5_1
+  - mpi=1.0=mpich
+  - mpich=4.1.1=hbae89fd_0
+  - mpmath=1.3.0=py38h06a4308_0
+  - msgpack-python=1.0.3=py38hd09550d_0
+  - multidict=6.0.2=py38h5eee18b_0
+  - multipledispatch=0.6.0=py38_0
+  - multiprocess=0.70.14=py38h06a4308_0
+  - munkres=1.1.4=py_0
+  - mypy_extensions=1.0.0=py38h06a4308_0
+  - mysql=5.7.24=h721c034_2
+  - nbclassic=0.5.5=py38h06a4308_0
+  - nbclient=0.5.13=py38h06a4308_0
+  - nbconvert=6.5.4=py38h06a4308_0
+  - nbformat=5.9.2=py38h06a4308_0
+  - ncurses=6.4=h6a678d5_0
+  - nest-asyncio=1.5.6=py38h06a4308_0
+  - networkx=3.1=py38h06a4308_0
+  - nltk=3.8.1=py38h06a4308_0
+  - notebook=6.5.4=py38h06a4308_1
+  - notebook-shim=0.2.2=py38h06a4308_0
+  - nspr=4.35=h6a678d5_0
+  - nss=3.89.1=h6a678d5_0
+  - numba=0.57.1=py38h1128e8f_0
+  - numexpr=2.8.4=py38hc78ab66_1
+  - numpy=1.24.3=py38hf6e8229_1
+  - numpy-base=1.24.3=py38h060ed82_1
+  - numpydoc=1.5.0=py38h06a4308_0
+  - oniguruma=6.9.7.1=h27cfd23_0
+  - openjpeg=2.4.0=h3ad879b_0
+  - openpyxl=3.0.10=py38h5eee18b_0
+  - openssl=3.0.10=h7f8727e_2
+  - orc=1.7.4=hb3bc3d3_1
+  - packaging=23.1=py38h06a4308_0
+  - pandas=2.0.3=py38h1128e8f_0
+  - pandocfilters=1.5.0=pyhd3eb1b0_0
+  - panel=0.14.3=py38h06a4308_0
+  - param=1.13.0=py38h06a4308_0
+  - parsel=1.6.0=py38h06a4308_0
+  - parso=0.8.3=pyhd3eb1b0_0
+  - partd=1.4.0=py38h06a4308_0
+  - pathspec=0.10.3=py38h06a4308_0
+  - patsy=0.5.3=py38h06a4308_0
+  - pcre=8.45=h295c915_0
+  - pep8=1.7.1=py38h06a4308_1
+  - pexpect=4.8.0=pyhd3eb1b0_3
+  - pickleshare=0.7.5=pyhd3eb1b0_1003
+  - pip=23.2.1=py38h06a4308_0
+  - pkgutil-resolve-name=1.3.10=py38h06a4308_0
+  - platformdirs=3.10.0=py38h06a4308_0
+  - plotly=5.9.0=py38h06a4308_0
+  - pluggy=1.0.0=py38h06a4308_1
+  - ply=3.11=py38_0
+  - pooch=1.4.0=pyhd3eb1b0_0
+  - poyo=0.5.0=pyhd3eb1b0_0
+  - prometheus_client=0.14.1=py38h06a4308_0
+  - prompt-toolkit=3.0.36=py38h06a4308_0
+  - prompt_toolkit=3.0.36=hd3eb1b0_0
+  - protego=0.1.16=py_0
+  - psutil=5.9.0=py38h5eee18b_0
+  - ptyprocess=0.7.0=pyhd3eb1b0_2
+  - pure_eval=0.2.2=pyhd3eb1b0_0
+  - py-cpuinfo=8.0.0=pyhd3eb1b0_1
+  - pyarrow=11.0.0=py38h468efa6_1
+  - pyasn1=0.4.8=pyhd3eb1b0_0
+  - pyasn1-modules=0.2.8=py_0
+  - pycodestyle=2.10.0=py38h06a4308_0
+  - pycparser=2.21=pyhd3eb1b0_0
+  - pyct=0.5.0=py38h06a4308_0
+  - pycurl=7.45.2=py38hdbd6064_1
+  - pydispatcher=2.0.5=py38h06a4308_2
+  - pydocstyle=6.3.0=py38h06a4308_0
+  - pyerfa=2.0.0=py38h27cfd23_0
+  - pyflakes=3.0.1=py38h06a4308_0
+  - pygments=2.15.1=py38h06a4308_1
+  - pylint=2.16.2=py38h06a4308_0
+  - pylint-venv=2.3.0=py38h06a4308_0
+  - pyls-spyder=0.4.0=pyhd3eb1b0_0
+  - pyodbc=4.0.34=py38h6a678d5_0
+  - pyopenssl=23.2.0=py38h06a4308_0
+  - pyqt=5.15.7=py38h6a678d5_1
+  - pyqt5-sip=12.11.0=py38h6a678d5_1
+  - pyqtwebengine=5.15.7=py38h6a678d5_1
+  - pyrsistent=0.18.0=py38heee7806_0
+  - pysocks=1.7.1=py38h06a4308_0
+  - pytables=3.8.0=py38hb8ae3fc_3
+  - pytest=7.4.0=py38h06a4308_0
+  - python=3.8.18=h955ad1f_0
+  - python-dateutil=2.8.2=pyhd3eb1b0_0
+  - python-fastjsonschema=2.16.2=py38h06a4308_0
+  - python-json-logger=2.0.7=py38h06a4308_0
+  - python-kaleido=0.2.1=py38h06a4308_0
+  - python-lmdb=1.4.1=py38h6a678d5_0
+  - python-lsp-black=1.2.1=py38h06a4308_0
+  - python-lsp-jsonrpc=1.0.0=pyhd3eb1b0_0
+  - python-lsp-server=1.7.2=py38h06a4308_0
+  - python-slugify=5.0.2=pyhd3eb1b0_0
+  - python-snappy=0.6.1=py38h6a678d5_0
+  - python-tzdata=2023.3=pyhd3eb1b0_0
+  - python-xxhash=2.0.2=py38h5eee18b_1
+  - pytoolconfig=1.2.5=py38h06a4308_1
+  - pytz=2023.3.post1=py38h06a4308_0
+  - pyviz_comms=2.3.0=py38h06a4308_0
+  - pywavelets=1.4.1=py38h5eee18b_0
+  - pyxdg=0.27=pyhd3eb1b0_0
+  - pyyaml=6.0=py38h5eee18b_1
+  - pyzmq=23.2.0=py38h6a678d5_0
+  - qdarkstyle=3.0.2=pyhd3eb1b0_0
+  - qstylizer=0.2.2=py38h06a4308_0
+  - qt-main=5.15.2=h7358343_9
+  - qt-webengine=5.15.9=h9ab4d14_7
+  - qtawesome=1.2.2=py38h06a4308_0
+  - qtconsole=5.4.2=py38h06a4308_0
+  - qtpy=2.2.0=py38h06a4308_0
+  - qtwebkit=5.212=h3fafdc1_5
+  - queuelib=1.5.0=py38h06a4308_0
+  - re2=2022.04.01=h295c915_0
+  - readline=8.2=h5eee18b_0
+  - regex=2022.7.9=py38h5eee18b_0
+  - requests=2.31.0=py38h06a4308_0
+  - requests-file=1.5.1=pyhd3eb1b0_0
+  - responses=0.13.3=pyhd3eb1b0_0
+  - rfc3339-validator=0.1.4=py38h06a4308_0
+  - rfc3986-validator=0.1.1=py38h06a4308_0
+  - rope=1.7.0=py38h06a4308_0
+  - rtree=1.0.1=py38h06a4308_0
+  - s3fs=2023.4.0=py38h06a4308_0
+  - safetensors=0.3.2=py38hb02cf49_0
+  - scikit-image=0.19.3=py38h6a678d5_1
+  - scikit-learn=1.3.0=py38h1128e8f_0
+  - scikit-learn-intelex=2023.1.1=py38h06a4308_0
+  - scipy=1.10.1=py38hf6e8229_1
+  - scrapy=2.8.0=py38h06a4308_0
+  - seaborn=0.12.2=py38h06a4308_0
+  - secretstorage=3.3.1=py38h06a4308_1
+  - send2trash=1.8.0=pyhd3eb1b0_1
+  - service_identity=18.1.0=pyhd3eb1b0_1
+  - setuptools=68.0.0=py38h06a4308_0
+  - sip=6.6.2=py38h6a678d5_0
+  - six=1.16.0=pyhd3eb1b0_1
+  - smart_open=5.2.1=py38h06a4308_0
+  - snappy=1.1.9=h295c915_0
+  - sniffio=1.2.0=py38h06a4308_1
+  - snowballstemmer=2.2.0=pyhd3eb1b0_0
+  - sortedcontainers=2.4.0=pyhd3eb1b0_0
+  - soupsieve=2.4=py38h06a4308_0
+  - sphinx=5.0.2=py38h06a4308_0
+  - sphinxcontrib-applehelp=1.0.2=pyhd3eb1b0_0
+  - sphinxcontrib-devhelp=1.0.2=pyhd3eb1b0_0
+  - sphinxcontrib-htmlhelp=2.0.0=pyhd3eb1b0_0
+  - sphinxcontrib-jsmath=1.0.1=pyhd3eb1b0_0
+  - sphinxcontrib-qthelp=1.0.3=pyhd3eb1b0_0
+  - sphinxcontrib-serializinghtml=1.1.5=pyhd3eb1b0_0
+  - spyder=5.4.3=py38h06a4308_1
+  - spyder-kernels=2.4.4=py38h06a4308_0
+  - sqlalchemy=1.4.39=py38h5eee18b_0
+  - sqlite=3.41.2=h5eee18b_0
+  - stack_data=0.2.0=pyhd3eb1b0_0
+  - statsmodels=0.14.0=py38ha9d4c09_0
+  - sympy=1.11.1=py38h06a4308_0
+  - tabulate=0.8.10=py38h06a4308_0
+  - tbb=2021.8.0=hdb19cb5_0
+  - tbb4py=2021.8.0=py38hdb19cb5_0
+  - tblib=1.7.0=pyhd3eb1b0_0
+  - tenacity=8.2.2=py38h06a4308_0
+  - terminado=0.17.1=py38h06a4308_0
+  - text-unidecode=1.3=pyhd3eb1b0_0
+  - textdistance=4.2.1=pyhd3eb1b0_0
+  - threadpoolctl=2.2.0=pyh0d69192_0
+  - three-merge=0.1.1=pyhd3eb1b0_0
+  - tifffile=2023.4.12=py38h06a4308_0
+  - tinycss2=1.2.1=py38h06a4308_0
+  - tk=8.6.12=h1ccaba5_0
+  - tldextract=3.2.0=pyhd3eb1b0_0
+  - toml=0.10.2=pyhd3eb1b0_0
+  - tomli=2.0.1=py38h06a4308_0
+  - tomlkit=0.11.1=py38h06a4308_0
+  - toolz=0.12.0=py38h06a4308_0
+  - tornado=6.3.2=py38h5eee18b_0
+  - tqdm=4.65.0=py38hb070fc8_0
+  - traitlets=5.7.1=py38h06a4308_0
+  - twisted=22.10.0=py38h5eee18b_0
+  - typing_extensions=4.7.1=py38h06a4308_0
+  - ujson=5.4.0=py38h6a678d5_0
+  - unidecode=1.2.0=pyhd3eb1b0_0
+  - unixodbc=2.3.11=h5eee18b_0
+  - urllib3=1.26.16=py38h06a4308_0
+  - utf8proc=2.6.1=h27cfd23_0
+  - w3lib=1.21.0=pyhd3eb1b0_0
+  - watchdog=2.1.6=py38h06a4308_0
+  - wcwidth=0.2.5=pyhd3eb1b0_0
+  - webencodings=0.5.1=py38_1
+  - websocket-client=0.58.0=py38h06a4308_4
+  - werkzeug=2.2.3=py38h06a4308_0
+  - whatthepatch=1.0.2=py38h06a4308_0
+  - wheel=0.38.4=py38h06a4308_0
+  - widgetsnbextension=4.0.5=py38h06a4308_0
+  - wrapt=1.14.1=py38h5eee18b_0
+  - wurlitzer=3.0.2=py38h06a4308_0
+  - xarray=2022.11.0=py38h06a4308_0
+  - xxhash=0.8.0=h7f8727e_3
+  - xz=5.4.2=h5eee18b_0
+  - y-py=0.5.9=py38h52d8a92_0
+  - yaml=0.2.5=h7b6447c_0
+  - yapf=0.31.0=pyhd3eb1b0_0
+  - yarl=1.8.1=py38h5eee18b_0
+  - ypy-websocket=0.8.2=py38h06a4308_0
+  - zeromq=4.3.4=h2531618_0
+  - zfp=1.0.0=h6a678d5_0
+  - zict=2.2.0=py38h06a4308_0
+  - zipp=3.11.0=py38h06a4308_0
+  - zlib=1.2.13=h5eee18b_0
+  - zlib-ng=2.0.7=h5eee18b_0
+  - zope=1.0=py38_1
+  - zope.interface=5.4.0=py38h7f8727e_0
+  - zstd=1.5.5=hc292b87_0
+  - pip:
+      - absl-py==2.0.0
+      - addict==2.4.0
+      - altair==5.1.2
+      - annotated-types==0.6.0
+      - antlr4-python3-runtime==4.9.3
+      - anyio==3.7.1
+      - autodistill==0.1.16
+      - autodistill-detic==0.1.4
+      - autodistill-fastsam==0.1.0
+      - autodistill-grounded-sam==0.1.1
+      - autodistill-grounding-dino==0.1.2
+      - autodistill-llava==0.1.0
+      - autodistill-metaclip==0.1.1
+      - autodistill-owl-vit==0.1.1
+      - autodistill-owlv2==0.1.0
+      - autodistill-sam-clip==0.1.3
+      - autodistill-seggpt==0.1.6
+      - autodistill-yolov8==0.1.2
+      - black==22.3.0
+      - cachetools==5.3.1
+      - clip==1.0
+      - cmake==3.27.5
+      - combinadics==0.0.3
+      - cycler==0.10.0
+      - cython==3.0.4
+      - dataclasses==0.6
+      - detectron2-layers==0.0.5
+      - einops==0.7.0
+      - einops-exts==0.0.4
+      - fairscale==0.4.13
+      - fastapi==0.104.1
+      - fasttext==0.9.2
+      - ffmpy==0.3.1
+      - ftfy==6.1.1
+      - future==0.18.3
+      - fvcore==0.1.5.post20221221
+      - google-auth==2.23.0
+      - google-auth-oauthlib==1.0.0
+      - gradio==3.35.2
+      - gradio-client==0.7.0
+      - grpcio==1.58.0
+      - h11==0.14.0
+      - httpcore==0.18.0
+      - httpx==0.25.0
+      - huggingface-hub==0.17.3
+      - hydra-core==1.3.2
+      - idna==2.10
+      - iopath==0.1.9
+      - linkify-it-py==2.0.2
+      - lit==16.0.6
+      - llava==0.0.1.dev0
+      - lvis==0.5.3
+      - markdown-it-py==2.2.0
+      - mdit-py-plugins==0.3.3
+      - mdurl==0.1.2
+      - mss==9.0.1
+      - natsort==8.4.0
+      - nvidia-cublas-cu11==11.10.3.66
+      - nvidia-cuda-cupti-cu11==11.7.101
+      - nvidia-cuda-nvrtc-cu11==11.7.99
+      - nvidia-cuda-runtime-cu11==11.7.99
+      - nvidia-cudnn-cu11==8.5.0.96
+      - nvidia-cufft-cu11==10.9.0.58
+      - nvidia-curand-cu11==10.2.10.91
+      - nvidia-cusolver-cu11==11.4.0.1
+      - nvidia-cusparse-cu11==11.7.4.91
+      - nvidia-nccl-cu11==2.14.3
+      - nvidia-nvtx-cu11==11.7.91
+      - oauthlib==3.2.2
+      - omegaconf==2.3.0
+      - onnx==1.14.1
+      - onnx-simplifier==0.4.33
+      - open-clip-torch==2.23.0
+      - open-flamingo==2.0.1
+      - opencv-python==4.8.0.76
+      - opencv-python-headless==4.8.0.74
+      - orjson==3.9.10
+      - pillow==8.4.0
+      - portalocker==2.8.2
+      - protobuf==4.24.3
+      - pybind11==2.11.1
+      - pycocotools==2.0.7
+      - pydantic==2.4.2
+      - pydantic-core==2.10.1
+      - pydot==1.4.2
+      - pydub==0.25.1
+      - pyparsing==2.4.7
+      - python-dotenv==1.0.0
+      - python-magic==0.4.27
+      - python-multipart==0.0.6
+      - requests-oauthlib==1.3.1
+      - requests-toolbelt==1.0.0
+      - rf-groundingdino==0.1.2
+      - rf-segment-anything==1.0
+      - rich==13.5.3
+      - roboflow==1.1.9
+      - rsa==4.9
+      - semantic-version==2.10.0
+      - sentencepiece==0.1.98
+      - sentry-sdk==1.34.0
+      - starlette==0.27.0
+      - supervision==0.9.0
+      - tensorboard==2.14.0
+      - tensorboard-data-server==0.7.1
+      - termcolor==2.3.0
+      - thop==0.1.1-2209072238
+      - timm==0.9.8
+      - tokenizers==0.14.1
+      - torch==2.0.1
+      - torchvision==0.15.2
+      - transformers==4.35.0.dev0
+      - triton==2.0.0
+      - typing-extensions==4.8.0
+      - uc-micro-py==1.0.2
+      - ultralytics==8.0.81
+      - uvicorn==0.23.2
+      - websockets==11.0.3
+      - yacs==0.1.8

environment_initial.yml ADDED Viewed

	@@ -0,0 +1,139 @@

+# Environment used to reporduce results
+name: peekaboo
+channels:
+  - defaults
+  - conda-forge
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - asttokens=2.4.0=pyhd8ed1ab_0
+  - backcall=0.2.0=pyh9f0ad1d_0
+  - backports=1.0=pyhd8ed1ab_3
+  - backports.functools_lru_cache=1.6.5=pyhd8ed1ab_0
+  - ca-certificates=2023.7.22=hbcca054_0
+  - comm=0.1.4=pyhd8ed1ab_0
+  - debugpy=1.6.7=py38h6a678d5_0
+  - entrypoints=0.4=pyhd8ed1ab_0
+  - executing=1.2.0=pyhd8ed1ab_0
+  - ipykernel=6.25.2=pyh2140261_0
+  - ipython=8.12.0=pyh41d4057_0
+  - jedi=0.19.1=pyhd8ed1ab_0
+  - jupyter_client=7.3.4=pyhd8ed1ab_0
+  - jupyter_core=5.4.0=py38h578d9bd_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libsodium=1.0.18=h36c2ea0_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
+  - ncurses=6.4=h6a678d5_0
+  - nest-asyncio=1.5.8=pyhd8ed1ab_0
+  - openssl=3.0.11=h7f8727e_2
+  - packaging=23.2=pyhd8ed1ab_0
+  - parso=0.8.3=pyhd8ed1ab_0
+  - pexpect=4.8.0=pyh1a96a4e_2
+  - pickleshare=0.7.5=py_1003
+  - pip=23.1.2=py38h06a4308_0
+  - platformdirs=3.11.0=pyhd8ed1ab_0
+  - prompt-toolkit=3.0.39=pyha770c72_0
+  - prompt_toolkit=3.0.39=hd8ed1ab_0
+  - psutil=5.9.0=py38h5eee18b_0
+  - ptyprocess=0.7.0=pyhd3deb0d_0
+  - pure_eval=0.2.2=pyhd8ed1ab_0
+  - python=3.8.16=h955ad1f_4
+  - python-dateutil=2.8.2=pyhd8ed1ab_0
+  - python_abi=3.8=2_cp38
+  - pyzmq=25.1.0=py38h6a678d5_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=67.8.0=py38h06a4308_0
+  - six=1.16.0=pyh6c4a22f_0
+  - sqlite=3.41.2=h5eee18b_0
+  - stack_data=0.6.2=pyhd8ed1ab_0
+  - tk=8.6.12=h1ccaba5_0
+  - tornado=6.1=py38h0a891b7_3
+  - traitlets=5.11.2=pyhd8ed1ab_0
+  - typing_extensions=4.8.0=pyha770c72_0
+  - wcwidth=0.2.8=pyhd8ed1ab_0
+  - wheel=0.38.4=py38h06a4308_0
+  - xz=5.4.2=h5eee18b_0
+  - zeromq=4.3.4=h2531618_0
+  - zlib=1.2.13=h5eee18b_0
+  - pip:
+      - absl-py==1.4.0
+      - addict==2.4.0
+      - cachetools==5.3.1
+      - certifi==2023.5.7
+      - charset-normalizer==3.1.0
+      - cmake==3.26.4
+      - decorator==4.4.2
+      - filelock==3.12.2
+      - fonttools==4.41.0
+      - google-auth==2.22.0
+      - google-auth-oauthlib==1.0.0
+      - grpcio==1.56.0
+      - idna==3.4
+      - imageio==2.31.1
+      - imageio-ffmpeg==0.4.8
+      - importlib-metadata==6.8.0
+      - jinja2==3.1.2
+      - kiwisolver==1.4.4
+      - labelimg==1.8.6
+      - lazy-loader==0.3
+      - lit==16.0.6
+      - lxml==4.9.2
+      - markdown==3.4.3
+      - markdown-it-py==3.0.0
+      - markupsafe==2.1.3
+      - matplotlib==3.7.2
+      - mdurl==0.1.2
+      - moviepy==1.0.3
+      - mpmath==1.3.0
+      - networkx==3.1
+      - numpy==1.24.4
+      - nvidia-cublas-cu11==11.10.3.66
+      - nvidia-cuda-cupti-cu11==11.7.101
+      - nvidia-cuda-nvrtc-cu11==11.7.99
+      - nvidia-cuda-runtime-cu11==11.7.99
+      - nvidia-cudnn-cu11==8.5.0.96
+      - nvidia-cufft-cu11==10.9.0.58
+      - nvidia-curand-cu11==10.2.10.91
+      - nvidia-cusolver-cu11==11.4.0.1
+      - nvidia-cusparse-cu11==11.7.4.91
+      - nvidia-nccl-cu11==2.14.3
+      - nvidia-nvtx-cu11==11.7.91
+      - oauthlib==3.2.2
+      - onnx==1.14.0
+      - onnx-simplifier==0.4.33
+      - opencv-python==4.5.5.64
+      - opencv-python-headless==4.5.5.64
+      - pillow==9.5.0
+      - proglog==0.1.10
+      - protobuf==4.23.4
+      - pyasn1==0.5.0
+      - pyasn1-modules==0.3.0
+      - pycocotools==2.0.6
+      - pygments==2.15.1
+      - pyqt5==5.15.9
+      - pyqt5-qt5==5.15.2
+      - pyqt5-sip==12.12.1
+      - pywavelets==1.4.1
+      - pyyaml==6.0
+      - requests==2.31.0
+      - requests-oauthlib==1.3.1
+      - rich==13.4.2
+      - rsa==4.9
+      - scikit-image==0.21.0
+      - scipy==1.10.1
+      - sympy==1.12
+      - tensorboard==2.13.0
+      - tensorboard-data-server==0.7.1
+      - thop==0.1.1-2209072238
+      - tifffile==2023.7.10
+      - torch==2.0.1
+      - torchvision==0.15.2
+      - tqdm==4.65.0
+      - triton==2.0.0
+      - typing-extensions==4.7.1
+      - urllib3==1.26.16
+      - werkzeug==2.3.6

evaluate.py ADDED Viewed

	@@ -0,0 +1,110 @@

+# Copyright 2022 - Valeo Comfort and Driving Assistance - Oriane Siméoni @ valeo.ai
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+from model import PeekabooModel
+from misc import load_config
+from datasets.datasets import build_dataset
+from evaluation.saliency import evaluate_saliency
+from evaluation.uod import evaluation_unsupervised_object_discovery
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Evaluation of Peekaboo",
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+    )
+    parser.add_argument(
+        "--eval-type", type=str, choices=["saliency", "uod"], help="Evaluation type."
+    )
+    parser.add_argument(
+        "--dataset-eval",
+        type=str,
+        choices=["ECSSD", "DUT-OMRON", "DUTS-TEST", "VOC07", "VOC12", "COCO20k"],
+        help="Name of evaluation dataset.",
+    )
+    parser.add_argument(
+        "--dataset-set-eval", type=str, default=None, help="Set of the dataset."
+    )
+    parser.add_argument(
+        "--apply-bilateral", action="store_true", help="use bilateral solver."
+    )
+    parser.add_argument(
+        "--evaluation-mode",
+        type=str,
+        default="multi",
+        choices=["single", "multi"],
+        help="Type of evaluation.",
+    )
+    parser.add_argument(
+        "--model-weights",
+        type=str,
+        default="data/weights/decoder_weights.pt",
+    )
+    parser.add_argument(
+        "--dataset-dir",
+        type=str,
+    )
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="configs/peekaboo_DUTS-TR.yaml",
+    )
+    args = parser.parse_args()
+    print(args.__dict__)
+    # Configuration
+    config, _ = load_config(args.config)
+    # Load the model
+    model = PeekabooModel(
+        vit_model=config.model["pre_training"],
+        vit_arch=config.model["arch"],
+        vit_patch_size=config.model["patch_size"],
+        enc_type_feats=config.peekaboo["feats"],
+    )
+    # Load weights
+    model.decoder_load_weights(args.model_weights)
+    model.eval()
+    print(f"Model {args.model_weights} loaded correctly.")
+    # Build the validation set
+    val_dataset = build_dataset(
+        root_dir=args.dataset_dir,
+        dataset_name=args.dataset_eval,
+        dataset_set=args.dataset_set_eval,
+        for_eval=True,
+        evaluation_type=args.eval_type,
+    )
+    print(f"\nBuilding dataset {val_dataset.name} (#{len(val_dataset)} images)")
+    # Validation
+    print(f"\nStarted evaluation on {val_dataset.name}")
+    if args.eval_type == "saliency":
+        evaluate_saliency(
+            val_dataset,
+            model=model,
+            evaluation_mode=args.evaluation_mode,
+            apply_bilateral=args.apply_bilateral,
+        )
+    elif args.eval_type == "uod":
+        if args.apply_bilateral:
+            raise ValueError("Not implemented.")
+        evaluation_unsupervised_object_discovery(
+            val_dataset,
+            model=model,
+            evaluation_mode=args.evaluation_mode,
+        )
+    else:
+        raise ValueError("Other evaluation method to come.")

evaluate_saliency.sh ADDED Viewed

	@@ -0,0 +1,12 @@

+MODEL=$1
+DATASET_DIR=$2
+MODE=$3
+# Unsupervised saliency detection evaluation
+for DATASET in ECSSD DUTS-TEST DUT-OMRON
+do
+    python evaluate.py --eval-type saliency --dataset-eval $DATASET \
+            --model-weights $MODEL --evaluation-mode $MODE --apply-bilateral --dataset-dir $DATASET_DIR
+done

evaluate_uod.sh ADDED Viewed

	@@ -0,0 +1,11 @@

+MODEL=$1
+DATASET_DIR=$2
+# Single object discovery evaluation
+for DATASET in VOC07 VOC12 COCO20k
+do
+    python evaluate.py --eval-type uod --dataset-eval $DATASET \
+            --model-weights $MODEL --evaluation-mode single --dataset-dir $DATASET_DIR
+done

evaluation/__init__.py ADDED Viewed

File without changes

evaluation/metrics/__init__.py ADDED Viewed

File without changes

evaluation/metrics/average_meter.py ADDED Viewed

	@@ -0,0 +1,22 @@

+"""
+Code borrowed from SelfMask: https://github.com/NoelShin/selfmask
+"""
+class AverageMeter(object):
+    """Computes and stores the average and current value"""
+    def __init__(self):
+        self.reset()
+    def reset(self):
+        self.val = 0
+        self.avg = 0
+        self.sum = 0
+        self.count = 0
+    def update(self, val, n: int):
+        self.val = val
+        self.sum += val * n
+        self.count += n
+        self.avg = self.sum / self.count

evaluation/metrics/f_measure.py ADDED Viewed

	@@ -0,0 +1,112 @@

+"""
+Code borrowed from SelfMask: https://github.com/NoelShin/selfmask
+"""
+import torch
+class FMeasure:
+    def __init__(
+        self,
+        default_thres: float = 0.5,
+        beta_square: float = 0.3,
+        n_bins: int = 255,
+        eps: float = 1e-7,
+    ):
+        """
+        :param default_thres: a hyperparameter for F-measure that is used to binarize a predicted mask. Default: 0.5
+        :param beta_square: a hyperparameter for F-measure. Default: 0.3
+        :param n_bins: the number of thresholds that will be tested for F-max. Default: 255
+        :param eps: a small value for numerical stability
+        """
+        self.beta_square = beta_square
+        self.default_thres = default_thres
+        self.eps = eps
+        self.n_bins = n_bins
+    def _compute_precision_recall(
+        self, binary_pred_mask: torch.Tensor, gt_mask: torch.Tensor
+    ) -> torch.Tensor:
+        """
+        :param binary_pred_mask: (B x H x W) or (H x W)
+        :param gt_mask: (B x H x W) or (H x W), should be the same with binary_pred_mask
+        """
+        tp = torch.logical_and(binary_pred_mask, gt_mask).sum(dim=(-1, -2))
+        tp_fp = binary_pred_mask.sum(dim=(-1, -2))
+        tp_fn = gt_mask.sum(dim=(-1, -2))
+        prec = tp / (tp_fp + self.eps)
+        recall = tp / (tp_fn + self.eps)
+        return prec, recall
+    def _compute_f_measure(
+        self,
+        pred_mask: torch.Tensor,
+        gt_mask: torch.Tensor,
+        thresholds: torch.Tensor = None,
+    ) -> torch.Tensor:
+        if thresholds is None:
+            binary_pred_mask = pred_mask > self.default_thres
+        else:
+            binary_pred_mask = pred_mask > thresholds
+        prec, recall = self._compute_precision_recall(binary_pred_mask, gt_mask)
+        f_measure = ((1 + (self.beta_square**2)) * prec * recall) / (
+            (self.beta_square**2) * prec + recall + self.eps
+        )
+        return f_measure.cpu()
+    def _compute_f_max(
+        self, pred_mask: torch.Tensor, gt_mask: torch.Tensor
+    ) -> torch.Tensor:
+        """Compute self.n_bins + 1  F-measures, each of which has a different threshold, then return the maximum
+        F-measure among them.
+        :param pred_mask: (H x W)
+        :param gt_mask: (H x W)
+        """
+        # pred_masks, gt_masks: H x W -> self.n_bins x H x W
+        pred_masks = pred_mask.unsqueeze(dim=0).repeat(self.n_bins, 1, 1)
+        gt_masks = gt_mask.unsqueeze(dim=0).repeat(self.n_bins, 1, 1)
+        # thresholds: self.n_bins x 1 x 1
+        thresholds = (
+            torch.arange(0, 1, 1 / self.n_bins)
+            .view(self.n_bins, 1, 1)
+            .to(pred_masks.device)
+        )
+        # f_measures: self.n_bins
+        f_measures = self._compute_f_measure(pred_masks, gt_masks, thresholds)
+        return torch.max(f_measures).cpu(), f_measures
+    def _compute_f_mean(
+        self,
+        pred_mask: torch.Tensor,
+        gt_mask: torch.Tensor,
+    ) -> torch.Tensor:
+        adaptive_thres = 2 * pred_mask.mean(dim=(-1, -2), keepdim=True)
+        binary_pred_mask = pred_mask > adaptive_thres
+        prec, recall = self._compute_precision_recall(binary_pred_mask, gt_mask)
+        f_mean = ((1 + (self.beta_square**2)) * prec * recall) / (
+            (self.beta_square**2) * prec + recall + self.eps
+        )
+        return f_mean.cpu()
+    def __call__(self, pred_mask: torch.Tensor, gt_mask: torch.Tensor) -> dict:
+        """
+        :param pred_mask: (H x W) a normalized prediction mask with values in [0, 1]
+        :param gt_mask: (H x W) a binary ground truth mask with values in {0, 1}
+        :return: a dictionary with keys being "f_measure" and "f_max" and values being the respective values.
+        """
+        outputs: dict = dict()
+        for k in ("f_measure", "f_mean"):
+            outputs.update({k: getattr(self, f"_compute_{k}")(pred_mask, gt_mask)})
+        f_max_, all_f = self._compute_f_max(pred_mask, gt_mask)
+        outputs["f_max"] = f_max_
+        outputs["all_f"] = all_f  # List of all f values for all thresholds
+        return outputs

evaluation/metrics/iou.py ADDED Viewed

	@@ -0,0 +1,37 @@

+"""
+Code adapted from SelfMask: https://github.com/NoelShin/selfmask
+"""
+from typing import Optional, Union
+import numpy as np
+import torch
+def compute_iou(
+    pred_mask: Union[np.ndarray, torch.Tensor],
+    gt_mask: Union[np.ndarray, torch.Tensor],
+    threshold: Optional[float] = 0.5,
+    eps: float = 1e-7,
+) -> Union[np.ndarray, torch.Tensor]:
+    """
+    :param pred_mask: (B x H x W) or (H x W)
+    :param gt_mask: (B x H x W) or (H x W), same shape with pred_mask
+    :param threshold: a binarization threshold
+    :param eps: a small value for computational stability
+    :return: (B) or (1)
+    """
+    assert pred_mask.shape == gt_mask.shape, f"{pred_mask.shape} != {gt_mask.shape}"
+    # assert 0. <= pred_mask.to(torch.float32).min() and pred_mask.max().to(torch.float32) <= 1., f"{pred_mask.min(), pred_mask.max()}"
+    if threshold is not None:
+        pred_mask = pred_mask > threshold
+    if isinstance(pred_mask, np.ndarray):
+        intersection = np.logical_and(pred_mask, gt_mask).sum(axis=(-1, -2))
+        union = np.logical_or(pred_mask, gt_mask).sum(axis=(-1, -2))
+        ious = intersection / (union + eps)
+    else:
+        intersection = torch.logical_and(pred_mask, gt_mask).sum(dim=(-1, -2))
+        union = torch.logical_or(pred_mask, gt_mask).sum(dim=(-1, -2))
+        ious = (intersection / (union + eps)).cpu()
+    return ious

evaluation/metrics/mae.py ADDED Viewed

	@@ -0,0 +1,15 @@

+"""
+Code borrowed from SelfMask: https://github.com/NoelShin/selfmask
+"""
+import torch
+def compute_mae(pred_mask: torch.Tensor, gt_mask: torch.Tensor) -> torch.Tensor:
+    """
+    :param pred_mask: (H x W) or (B x H x W) a normalized prediction mask with values in [0, 1]
+    :param gt_mask: (H x W) or (B x H x W) a binary ground truth mask with values in {0, 1}
+    """
+    return torch.mean(
+        torch.abs(pred_mask - gt_mask.to(torch.float32)), dim=(-1, -2)
+    ).cpu()

evaluation/metrics/pixel_acc.py ADDED Viewed

	@@ -0,0 +1,21 @@

+"""
+Code borrowed from SelfMask: https://github.com/NoelShin/selfmask
+"""
+from typing import Optional
+import torch
+def compute_pixel_accuracy(
+    pred_mask: torch.Tensor, gt_mask: torch.Tensor, threshold: Optional[float] = 0.5
+) -> torch.Tensor:
+    """
+    :param pred_mask: (H x W) or (B x H x W) a normalized prediction mask with values in [0, 1]
+    :param gt_mask: (H x W) or (B x H x W) a binary ground truth mask with values in {0, 1}
+    """
+    if threshold is not None:
+        binary_pred_mask = pred_mask > threshold
+    else:
+        binary_pred_mask = pred_mask
+    return (binary_pred_mask == gt_mask).to(torch.float32).mean(dim=(-1, -2)).cpu()

evaluation/metrics/s_measure.py ADDED Viewed

	@@ -0,0 +1,126 @@

+# code borrowed from https://github.com/Hanqer/Evaluate-SOD/blob/master/evaluator.py
+import numpy as np
+import torch
+class SMeasure:
+    def __init__(self, alpha: float = 0.5):
+        self.alpha: float = alpha
+        self.cuda: bool = True
+    def _centroid(self, gt):
+        rows, cols = gt.size()[-2:]
+        gt = gt.view(rows, cols)
+        if gt.sum() == 0:
+            if self.cuda:
+                X = torch.eye(1).cuda() * round(cols / 2)
+                Y = torch.eye(1).cuda() * round(rows / 2)
+            else:
+                X = torch.eye(1) * round(cols / 2)
+                Y = torch.eye(1) * round(rows / 2)
+        else:
+            total = gt.sum()
+            if self.cuda:
+                i = torch.from_numpy(np.arange(0, cols)).cuda().float()
+                j = torch.from_numpy(np.arange(0, rows)).cuda().float()
+            else:
+                i = torch.from_numpy(np.arange(0, cols)).float()
+                j = torch.from_numpy(np.arange(0, rows)).float()
+            X = torch.round((gt.sum(dim=0) * i).sum() / total)
+            Y = torch.round((gt.sum(dim=1) * j).sum() / total)
+        return X.long(), Y.long()
+    def _ssim(self, pred, gt):
+        gt = gt.float()
+        h, w = pred.size()[-2:]
+        N = h * w
+        x = pred.mean()
+        y = gt.mean()
+        sigma_x2 = ((pred - x) * (pred - x)).sum() / (N - 1 + 1e-20)
+        sigma_y2 = ((gt - y) * (gt - y)).sum() / (N - 1 + 1e-20)
+        sigma_xy = ((pred - x) * (gt - y)).sum() / (N - 1 + 1e-20)
+        aplha = 4 * x * y * sigma_xy
+        beta = (x * x + y * y) * (sigma_x2 + sigma_y2)
+        if aplha != 0:
+            Q = aplha / (beta + 1e-20)
+        elif aplha == 0 and beta == 0:
+            Q = 1.0
+        else:
+            Q = 0
+        return Q
+    def _object(self, pred, gt):
+        temp = pred[gt == 1]
+        x = temp.mean()
+        sigma_x = temp.std()
+        score = 2.0 * x / (x * x + 1.0 + sigma_x + 1e-20)
+        return score
+    def _s_object(self, pred, gt):
+        fg = torch.where(gt == 0, torch.zeros_like(pred), pred)
+        bg = torch.where(gt == 1, torch.zeros_like(pred), 1 - pred)
+        o_fg = self._object(fg, gt)
+        o_bg = self._object(bg, 1 - gt)
+        u = gt.mean()
+        Q = u * o_fg + (1 - u) * o_bg
+        return Q
+    def _divide_gt(self, gt, X, Y):
+        h, w = gt.size()[-2:]
+        area = h * w
+        gt = gt.view(h, w)
+        LT = gt[:Y, :X]
+        RT = gt[:Y, X:w]
+        LB = gt[Y:h, :X]
+        RB = gt[Y:h, X:w]
+        X = X.float()
+        Y = Y.float()
+        w1 = X * Y / area
+        w2 = (w - X) * Y / area
+        w3 = X * (h - Y) / area
+        w4 = 1 - w1 - w2 - w3
+        return LT, RT, LB, RB, w1, w2, w3, w4
+    def _divide_prediction(self, pred, X, Y):
+        h, w = pred.size()[-2:]
+        pred = pred.view(h, w)
+        LT = pred[:Y, :X]
+        RT = pred[:Y, X:w]
+        LB = pred[Y:h, :X]
+        RB = pred[Y:h, X:w]
+        return LT, RT, LB, RB
+    def _s_region(self, pred, gt):
+        X, Y = self._centroid(gt)
+        gt1, gt2, gt3, gt4, w1, w2, w3, w4 = self._divide_gt(gt, X, Y)
+        p1, p2, p3, p4 = self._divide_prediction(pred, X, Y)
+        Q1 = self._ssim(p1, gt1)
+        Q2 = self._ssim(p2, gt2)
+        Q3 = self._ssim(p3, gt3)
+        Q4 = self._ssim(p4, gt4)
+        Q = w1 * Q1 + w2 * Q2 + w3 * Q3 + w4 * Q4
+        # print(Q)
+        return Q
+    def __call__(self, pred_mask: torch.Tensor, gt_mask: torch.Tensor):
+        assert pred_mask.shape == gt_mask.shape
+        y = gt_mask.mean()
+        if y == 0:
+            x = pred_mask.mean()
+            Q = 1.0 - x
+        elif y == 1:
+            x = pred_mask.mean()
+            Q = x
+        else:
+            gt_mask[gt_mask >= 0.5] = 1
+            gt_mask[gt_mask < 0.5] = 0
+            # print(self._S_object(pred, gt), self._S_region(pred, gt))
+            Q = self.alpha * self._s_object(pred_mask, gt_mask) + (
+                1 - self.alpha
+            ) * self._s_region(pred_mask, gt_mask)
+            if Q.item() < 0:
+                Q = torch.FloatTensor([0.0])
+        return Q.item()

evaluation/saliency.py ADDED Viewed

	@@ -0,0 +1,272 @@

+# Copyright 2022 - Valeo Comfort and Driving Assistance - valeo.ai
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import torch
+import numpy as np
+import torch.nn as nn
+import torch.nn.functional as F
+from tqdm import tqdm
+from scipy import ndimage
+from evaluation.metrics.average_meter import AverageMeter
+from evaluation.metrics.f_measure import FMeasure
+from evaluation.metrics.iou import compute_iou
+from evaluation.metrics.mae import compute_mae
+from evaluation.metrics.pixel_acc import compute_pixel_accuracy
+from evaluation.metrics.s_measure import SMeasure
+from misc import batch_apply_bilateral_solver
+@torch.no_grad()
+def write_metric_tf(writer, metrics, n_iter=-1, name=""):
+    writer.add_scalar(
+        f"Validation/{name}iou_pred",
+        metrics["ious"].avg,
+        n_iter,
+    )
+    writer.add_scalar(
+        f"Validation/{name}acc_pred",
+        metrics["pixel_accs"].avg,
+        n_iter,
+    )
+    writer.add_scalar(
+        f"Validation/{name}f_max",
+        metrics["f_maxs"].avg,
+        n_iter,
+    )
+@torch.no_grad()
+def eval_batch(batch_gt_masks, batch_pred_masks, metrics_res={}, reset=False):
+    """
+    Evaluation code adapted from SelfMask: https://github.com/NoelShin/selfmask
+    """
+    f_values = {}
+    # Keep track of f_values for each threshold
+    for i in range(255):  # should equal n_bins in metrics/f_measure.py
+        f_values[i] = AverageMeter()
+    if metrics_res == {}:
+        metrics_res["f_scores"] = AverageMeter()
+        metrics_res["f_maxs"] = AverageMeter()
+        metrics_res["f_maxs_fixed"] = AverageMeter()
+        metrics_res["f_means"] = AverageMeter()
+        metrics_res["maes"] = AverageMeter()
+        metrics_res["ious"] = AverageMeter()
+        metrics_res["pixel_accs"] = AverageMeter()
+        metrics_res["s_measures"] = AverageMeter()
+    if reset:
+        metrics_res["f_scores"].reset()
+        metrics_res["f_maxs"].reset()
+        metrics_res["f_maxs_fixed"].reset()
+        metrics_res["f_means"].reset()
+        metrics_res["maes"].reset()
+        metrics_res["ious"].reset()
+        metrics_res["pixel_accs"].reset()
+        metrics_res["s_measures"].reset()
+    # iterate over batch dimension
+    for _, (pred_mask, gt_mask) in enumerate(zip(batch_pred_masks, batch_gt_masks)):
+        assert pred_mask.shape == gt_mask.shape, f"{pred_mask.shape} != {gt_mask.shape}"
+        assert len(pred_mask.shape) == len(gt_mask.shape) == 2
+        # Compute
+        # Binarize at 0.5 for IoU and pixel accuracy
+        binary_pred = (pred_mask > 0.5).float().squeeze()
+        iou = compute_iou(binary_pred, gt_mask)
+        f_measures = FMeasure()(pred_mask, gt_mask)  # soft mask for F measure
+        mae = compute_mae(binary_pred, gt_mask)
+        pixel_acc = compute_pixel_accuracy(binary_pred, gt_mask)
+        # Update
+        metrics_res["ious"].update(val=iou.numpy(), n=1)
+        metrics_res["f_scores"].update(val=f_measures["f_measure"].numpy(), n=1)
+        metrics_res["f_maxs"].update(val=f_measures["f_max"].numpy(), n=1)
+        metrics_res["f_means"].update(val=f_measures["f_mean"].numpy(), n=1)
+        metrics_res["s_measures"].update(
+            val=SMeasure()(pred_mask=pred_mask, gt_mask=gt_mask.to(torch.float32)), n=1
+        )
+        metrics_res["maes"].update(val=mae.numpy(), n=1)
+        metrics_res["pixel_accs"].update(val=pixel_acc.numpy(), n=1)
+        # Keep track of f_values for each threshold
+        all_f = f_measures["all_f"].numpy()
+        for k, v in f_values.items():
+            v.update(val=all_f[k], n=1)
+        # Then compute the max for the f_max_fixed
+        metrics_res["f_maxs_fixed"].update(
+            val=np.max([v.avg for v in f_values.values()]), n=1
+        )
+    results = {}
+    # F-measure, F-max, F-mean, MAE, S-measure, IoU, pixel acc.
+    results["f_measure"] = metrics_res["f_scores"].avg
+    results["f_max"] = metrics_res["f_maxs"].avg
+    results["f_maxs_fixed"] = metrics_res["f_maxs_fixed"].avg
+    results["f_mean"] = metrics_res["f_means"].avg
+    results["s_measure"] = metrics_res["s_measures"].avg
+    results["mae"] = metrics_res["maes"].avg
+    results["iou"] = float(iou.numpy())
+    results["pixel_acc"] = metrics_res["pixel_accs"].avg
+    return results, metrics_res
+def evaluate_saliency(
+    dataset,
+    model,
+    writer=None,
+    batch_size=1,
+    n_iter=-1,
+    apply_bilateral=False,
+    im_fullsize=True,
+    method="pred",  # can also be "bkg",
+    apply_weights: bool = True,
+    evaluation_mode: str = "single",  # choices are ["single", "multi"]
+):
+    if im_fullsize:
+        # Change transformation
+        dataset.fullimg_mode()
+        batch_size = 1
+    valloader = torch.utils.data.DataLoader(
+        dataset, batch_size=batch_size, shuffle=False, num_workers=2
+    )
+    sigmoid = nn.Sigmoid()
+    metrics_res = {}
+    metrics_res_bs = {}
+    valbar = tqdm(enumerate(valloader, 0), leave=None)
+    for i, data in valbar:
+        inputs, _, _, _, _, gt_labels, _ = data
+        inputs = inputs.to("cuda")
+        gt_labels = gt_labels.to("cuda").float()
+        # Forward step
+        with torch.no_grad():
+            preds = model(inputs, for_eval=True)
+        h, w = gt_labels.shape[-2:]
+        preds_up = F.interpolate(
+            preds,
+            scale_factor=model.vit_patch_size,
+            mode="bilinear",
+            align_corners=False,
+        )[..., :h, :w]
+        soft_preds = sigmoid(preds_up.detach()).squeeze(0)
+        preds_up = (sigmoid(preds_up.detach()) > 0.5).squeeze(0).float()
+        reset = True if i == 0 else False
+        if evaluation_mode == "single":
+            labeled, nr_objects = ndimage.label(preds_up.squeeze().cpu().numpy())
+            if nr_objects == 0:
+                preds_up_one_cc = preds_up.squeeze()
+                print("nr_objects == 0")
+            else:
+                nb_pixel = [np.sum(labeled == i) for i in range(nr_objects + 1)]
+                pixel_order = np.argsort(nb_pixel)
+                cc = [torch.Tensor(labeled == i) for i in pixel_order]
+                cc = torch.stack(cc).cuda()
+                # Find CC set as background, here not necessarily the biggest
+                cc_background = (
+                    (
+                        (
+                            (~(preds_up[None, :, :, :].bool())).float()
+                            + cc[:, None, :, :].cuda()
+                        )
+                        > 1
+                    )
+                    .sum(-1)
+                    .sum(-1)
+                    .argmax()
+                )
+                pixel_order = np.delete(pixel_order, int(cc_background.cpu().numpy()))
+                preds_up_one_cc = torch.Tensor(labeled == pixel_order[-1]).cuda()
+            _, metrics_res = eval_batch(
+                gt_labels,
+                preds_up_one_cc.unsqueeze(0),
+                metrics_res=metrics_res,
+                reset=reset,
+            )
+        elif evaluation_mode == "multi":
+            # Eval without bilateral solver
+            _, metrics_res = eval_batch(
+                gt_labels,
+                soft_preds.unsqueeze(0) if len(soft_preds.shape) == 2 else soft_preds,
+                metrics_res=metrics_res,
+                reset=reset,
+            )  # soft preds needed for F beta measure
+        # Apply bilateral solver
+        preds_bs = None
+        if apply_bilateral:
+            get_all_cc = True if evaluation_mode == "multi" else False
+            preds_bs, _ = batch_apply_bilateral_solver(
+                data, preds_up.detach(), get_all_cc=get_all_cc
+            )
+            _, metrics_res_bs = eval_batch(
+                gt_labels,
+                preds_bs[None, :, :].float(),
+                metrics_res=metrics_res_bs,
+                reset=reset,
+            )
+        bar_str = (
+            f"{dataset.name} | {evaluation_mode} mode | "
+            f"F-max {metrics_res['f_maxs'].avg:.3f} "
+            f"IoU {metrics_res['ious'].avg:.3f}, "
+            f"PA {metrics_res['pixel_accs'].avg:.3f}"
+        )
+        if apply_bilateral:
+            bar_str += (
+                f" | with bilateral solver: "
+                f"F-max {metrics_res_bs['f_maxs'].avg:.3f}, "
+                f"IoU {metrics_res_bs['ious'].avg:.3f}, "
+                f"PA. {metrics_res_bs['pixel_accs'].avg:.3f}"
+            )
+        valbar.set_description(bar_str)
+    # Writing in tensorboard
+    if writer is not None:
+        write_metric_tf(
+            writer,
+            metrics_res,
+            n_iter=n_iter,
+            name=f"{dataset.name}_{evaluation_mode}_",
+        )
+        if apply_bilateral:
+            write_metric_tf(
+                writer,
+                metrics_res_bs,
+                n_iter=n_iter,
+                name=f"{dataset.name}_{evaluation_mode}-BS_",
+            )
+    # Go back to original transformation
+    if im_fullsize:
+        dataset.training_mode()

evaluation/uod.py ADDED Viewed

	@@ -0,0 +1,117 @@

+# Copyright 2021 - Valeo Comfort and Driving Assistance - Oriane Siméoni @ valeo.ai
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Code adapted from previous method LOST: https://github.com/valeoai/LOST
+"""
+import os
+import time
+import torch
+import torch.nn as nn
+import numpy as np
+from tqdm import tqdm
+from misc import bbox_iou, get_bbox_from_segmentation_labels
+def evaluation_unsupervised_object_discovery(
+    dataset,
+    model,
+    evaluation_mode: str = "single",  # choices are ["single", "multi"]
+    output_dir: str = "outputs",
+    no_hards: bool = False,
+):
+    assert evaluation_mode == "single"
+    sigmoid = nn.Sigmoid()
+    # ----------------------------------------------------
+    # Loop over images
+    preds_dict = {}
+    cnt = 0
+    corloc = np.zeros(len(dataset.dataloader))
+    start_time = time.time()
+    pbar = tqdm(dataset.dataloader)
+    for im_id, inp in enumerate(pbar):
+        # ------------ IMAGE PROCESSING -------------------------------------------
+        img = inp[0]
+        init_image_size = img.shape
+        # Get the name of the image
+        im_name = dataset.get_image_name(inp[1])
+        # Pass in case of no gt boxes in the image
+        if im_name is None:
+            continue
+        # Padding the image with zeros to fit multiple of patch-size
+        size_im = (
+            img.shape[0],
+            int(np.ceil(img.shape[1] / model.vit_patch_size) * model.vit_patch_size),
+            int(np.ceil(img.shape[2] / model.vit_patch_size) * model.vit_patch_size),
+        )
+        paded = torch.zeros(size_im)
+        paded[:, : img.shape[1], : img.shape[2]] = img
+        img = paded
+        # # Move to gpu
+        img = img.cuda(non_blocking=True)
+        # Size for transformers
+        # w_featmap = img.shape[-2] // model.vit_patch_size
+        # h_featmap = img.shape[-1] // model.vit_patch_size
+        # ------------ GROUND-TRUTH -------------------------------------------
+        gt_bbxs, gt_cls = dataset.extract_gt(inp[1], im_name)
+        if gt_bbxs is not None:
+            # Discard images with no gt annotations
+            # Happens only in the case of VOC07 and VOC12
+            if gt_bbxs.shape[0] == 0 and no_hards:
+                continue
+        outputs = model(img[None, :, :, :])
+        preds = (sigmoid(outputs[0].detach()) > 0.5).float().squeeze().cpu().numpy()
+        # get bbox
+        pred = get_bbox_from_segmentation_labels(
+            segmenter_predictions=preds,
+            scales=[model.vit_patch_size, model.vit_patch_size],
+            initial_image_size=init_image_size[1:],
+        )
+        # ------------ Visualizations -------------------------------------------
+        # Save the prediction
+        preds_dict[im_name] = pred
+        # Compare prediction to GT boxes
+        ious = bbox_iou(torch.from_numpy(pred), torch.from_numpy(gt_bbxs))
+        if torch.any(ious >= 0.5):
+            corloc[im_id] = 1
+        cnt += 1
+        if cnt % 50 == 0:
+            pbar.set_description(f"Peekaboo {int(np.sum(corloc))}/{cnt}")
+    # Evaluate
+    print(f"corloc: {100*np.sum(corloc)/cnt:.2f} ({int(np.sum(corloc))}/{cnt})")
+    result_file = os.path.join(output_dir, "uod_results.txt")
+    with open(result_file, "w") as f:
+        f.write("corloc,%.1f,,\n" % (100 * np.sum(corloc) / cnt))
+    print("File saved at %s" % result_file)

format_codebase.sh ADDED Viewed

	@@ -0,0 +1,16 @@

+#!/bin/sh
+# Script to format codebase
+# pip install autopep8
+# pip install --force-reinstall --upgrade typed-ast black
+# Run autopep8 to fix specific PEP 8 issues
+autopep8 --in-place --recursive --select=E1,E2,E3,W1,W2 ./**.py
+# Run black to enforce consistent formatting
+black ./
+# To run this file
+# chmod +x format_codebase.sh
+# ./format_codebase.sh

media/description.html ADDED Viewed

	@@ -0,0 +1,22 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <title>Title</title>
+</head>
+<body>
+    Try this demo for <a href="https://github.com/hasibzunair/peekaboo">PEEKABOO</a>,
+    introduced in our <strong>BMVC'2024</strong> paper <a href="https://arxiv.org/abs/2407.17628">PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization</a>.
+    </br>
+    Peekaboo aims to explicitly model contextual relationship among pixels through image masking for unsupervised object localization.
+    In a self-supervised procedure (i.e. pretext task) without any additional training (i.e. downstream task), context-based representation learning is done at both
+    the pixel-level by making predictions on masked images and at shape-level by matching the predictions of the masked input to the unmasked one.
+    </br>
+    You can use this demo to segment the most salient object(s) in your images. To use it, simply
+    upload an image of your choice and hit submit. You will get one or more segmentation maps of the most salient objects present
+    in your images.
+    </br>
+    <a href="https://hasibzunair.github.io/peekaboo/"><strong>Project Page</strong></a>
+    </br>
+</body>
+</html>

misc.py ADDED Viewed

	@@ -0,0 +1,337 @@

+# Code for Peekaboo
+# Author: Hasib Zunair
+# Modified from https://github.com/valeoai/FOUND, see license below.
+# Copyright 2022 - Valeo Comfort and Driving Assistance - Oriane Siméoni @ valeo.ai
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Helpers functions"""
+import re
+import os
+import cv2
+import sys
+import os.path as osp
+import errno
+import yaml
+import math
+import random
+import scipy.ndimage
+import numpy as np
+import torch
+import torch.nn.functional as F
+from typing import List
+from torchvision import transforms as T
+from bilateral_solver import bilateral_solver_output
+loader = yaml.SafeLoader
+loader.add_implicit_resolver(
+    "tag:yaml.org,2002:float",
+    re.compile(
+        """^(?:
+     [-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)?
+    |[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
+    |\\.[0-9_]+(?:[eE][-+][0-9]+)?
+    |[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\\.[0-9_]*
+    |[-+]?\\.(?:inf|Inf|INF)
+    |\\.(?:nan|NaN|NAN))$""",
+        re.X,
+    ),
+    list("-+0123456789."),
+)
+def mkdir_if_missing(directory):
+    if not osp.exists(directory):
+        try:
+            os.makedirs(directory)
+        except OSError as e:
+            if e.errno != errno.EEXIST:
+                raise
+class Logger(object):
+    """
+    Write console output to external text file.
+    Code imported from https://github.com/Cysu/open-reid/blob/master/reid/utils/logging.py.
+    """
+    def __init__(self, fpath=None):
+        self.console = sys.stdout
+        self.file = None
+        if fpath is not None:
+            mkdir_if_missing(os.path.dirname(fpath))
+            self.file = open(fpath, "w")
+    def __del__(self):
+        self.close()
+    def __enter__(self):
+        pass
+    def __exit__(self, *args):
+        self.close()
+    def write(self, msg):
+        self.console.write(msg)
+        if self.file is not None:
+            self.file.write(msg)
+    def flush(self):
+        self.console.flush()
+        if self.file is not None:
+            self.file.flush()
+            os.fsync(self.file.fileno())
+    def close(self):
+        self.console.close()
+        if self.file is not None:
+            self.file.close()
+class Struct:
+    def __init__(self, **entries):
+        self.__dict__.update(entries)
+def load_config(config_file):
+    with open(config_file, errors="ignore") as f:
+        # conf = yaml.safe_load(f)  # load config
+        conf = yaml.load(f, Loader=loader)
+    print("hyperparameters: " + ", ".join(f"{k}={v}" for k, v in conf.items()))
+    # TODO yaml_save(save_dir / 'config.yaml', conf)
+    return Struct(**conf), conf  # conf returned to print it
+def set_seed(seed: int) -> None:
+    """
+    Set all seeds to make results reproducible
+    """
+    # env
+    os.environ["PYTHONHASHSEED"] = str(seed)
+    # python
+    random.seed(seed)
+    # numpy
+    np.random.seed(seed)
+    # torch
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed(0)
+    torch.cuda.manual_seed_all(seed)
+    if torch.cuda.is_available():
+        torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = True
+def IoU(mask1, mask2):
+    """
+    Code adapted from TokenCut: https://github.com/YangtaoWANG95/TokenCut
+    """
+    mask1, mask2 = (mask1 > 0.5).to(torch.bool), (mask2 > 0.5).to(torch.bool)
+    intersection = torch.sum(mask1 * (mask1 == mask2), dim=[-1, -2]).squeeze()
+    union = torch.sum(mask1 + mask2, dim=[-1, -2]).squeeze()
+    return (intersection.to(torch.float) / union).mean().item()
+def batch_apply_bilateral_solver(data, masks, get_all_cc=True, shape=None):
+    cnt_bs = 0
+    masks_bs = []
+    # inputs, init_imgs, gt_labels, img_path = data
+    inputs, _, _, init_imgs, _, gt_labels, img_path = data
+    for id in range(inputs.shape[0]):
+        _, bs_mask, use_bs = apply_bilateral_solver(
+            mask=masks[id].squeeze().cpu().numpy(),
+            img=init_imgs[id],
+            img_path=img_path[id],
+            im_fullsize=False,
+            # Careful shape should be opposed
+            shape=(gt_labels.shape[-1], gt_labels.shape[-2]),
+            get_all_cc=get_all_cc,
+        )
+        cnt_bs += use_bs
+        # use the bilateral solver output if IoU > 0.5
+        if use_bs:
+            if shape is None:
+                shape = masks.shape[-2:]
+            # Interpolate to downsample the mask back
+            bs_ds = F.interpolate(
+                torch.Tensor(bs_mask).unsqueeze(0).unsqueeze(0),
+                shape,  # TODO check here
+                mode="bilinear",
+                align_corners=False,
+            )
+            masks_bs.append(bs_ds.bool().cuda().squeeze()[None, :, :])
+        else:
+            # Use initial mask
+            masks_bs.append(masks[id].cuda().squeeze()[None, :, :])
+    return torch.cat(masks_bs).squeeze(), cnt_bs
+def apply_bilateral_solver(
+    mask,
+    img,
+    img_path,
+    shape,
+    im_fullsize=False,
+    get_all_cc=False,
+    bs_iou_threshold: float = 0.5,
+    reshape: bool = True,
+):
+    # Get initial image in the case of using full image
+    img_init = None
+    if not im_fullsize:
+        # Use the image given by dataloader
+        shape = (img.shape[-1], img.shape[-2])
+        t = T.ToPILImage()
+        img_init = t(img)
+    if reshape:
+        # Resize predictions to image size
+        resized_mask = cv2.resize(mask, shape)
+        sel_obj_mask = resized_mask
+    else:
+        resized_mask = mask
+        sel_obj_mask = mask
+    # Apply bilinear solver
+    _, binary_solver = bilateral_solver_output(
+        img_path,
+        resized_mask,
+        img=img_init,
+        sigma_spatial=16,
+        sigma_luma=16,
+        sigma_chroma=8,
+        get_all_cc=get_all_cc,
+    )
+    mask1 = torch.from_numpy(resized_mask).cuda()
+    mask2 = torch.from_numpy(binary_solver).cuda().float()
+    use_bs = 0
+    # If enough overlap, use BS output
+    if IoU(mask1, mask2) > bs_iou_threshold:
+        sel_obj_mask = binary_solver.astype(float)
+        use_bs = 1
+    return resized_mask, sel_obj_mask, use_bs
+def get_bbox_from_segmentation_labels(
+    segmenter_predictions: torch.Tensor,
+    initial_image_size: torch.Size,
+    scales: List[int],
+) -> np.array:
+    """
+    Find the largest connected component in foreground, extract its bounding box
+    """
+    objects, num_objects = scipy.ndimage.label(segmenter_predictions)
+    # find biggest connected component
+    all_foreground_labels = objects.flatten()[objects.flatten() != 0]
+    most_frequent_label = np.bincount(all_foreground_labels).argmax()
+    mask = np.where(objects == most_frequent_label)
+    # Add +1 because excluded max
+    ymin, ymax = min(mask[0]), max(mask[0]) + 1
+    xmin, xmax = min(mask[1]), max(mask[1]) + 1
+    if initial_image_size == segmenter_predictions.shape:
+        # Masks are already upsampled
+        pred = [xmin, ymin, xmax, ymax]
+    else:
+        # Rescale to image size
+        r_xmin, r_xmax = scales[1] * xmin, scales[1] * xmax
+        r_ymin, r_ymax = scales[0] * ymin, scales[0] * ymax
+        pred = [r_xmin, r_ymin, r_xmax, r_ymax]
+    # Check not out of image size (used when padding)
+    if initial_image_size:
+        pred[2] = min(pred[2], initial_image_size[1])
+        pred[3] = min(pred[3], initial_image_size[0])
+    return np.asarray(pred)
+def bbox_iou(
+    box1: np.array,
+    box2: np.array,
+    x1y1x2y2: bool = True,
+    GIoU: bool = False,
+    DIoU: bool = False,
+    CIoU: bool = False,
+    eps: float = 1e-7,
+):
+    # https://github.com/ultralytics/yolov5/blob/develop/utils/general.py
+    # Returns the IoU of box1 to box2. box1 is 4, box2 is nx4
+    box2 = box2.T
+    # Get the coordinates of bounding boxes
+    if x1y1x2y2:  # x1, y1, x2, y2 = box1
+        b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
+        b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
+    else:  # transform from xywh to xyxy
+        b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2
+        b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
+        b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2
+        b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2
+    # Intersection area
+    inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * (
+        torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)
+    ).clamp(0)
+    # Union Area
+    w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
+    w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
+    union = w1 * h1 + w2 * h2 - inter + eps
+    iou = inter / union
+    if GIoU or DIoU or CIoU:
+        cw = torch.max(b1_x2, b2_x2) - torch.min(
+            b1_x1, b2_x1
+        )  # convex (smallest enclosing box) width
+        ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1)  # convex height
+        if CIoU or DIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
+            c2 = cw**2 + ch**2 + eps  # convex diagonal squared
+            rho2 = (
+                (b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2
+                + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2
+            ) / 4  # center distance squared
+            if DIoU:
+                return iou - rho2 / c2  # DIoU
+            elif (
+                CIoU
+            ):  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
+                v = (4 / math.pi**2) * torch.pow(
+                    torch.atan(w2 / h2) - torch.atan(w1 / h1), 2
+                )
+                with torch.no_grad():
+                    alpha = v / (v - iou + (1 + eps))
+                return iou - (rho2 / c2 + v * alpha)  # CIoU
+        else:  # GIoU https://arxiv.org/pdf/1902.09630.pdf
+            c_area = cw * ch + eps  # convex area
+            return iou - (c_area - union) / c_area  # GIoU
+    else:
+        return iou  # IoU

model.py ADDED Viewed

	@@ -0,0 +1,180 @@

+# Code for Peekaboo
+# Author: Hasib Zunair
+# Modified from https://github.com/valeoai/FOUND, see license below.
+# Copyright 2022 - Valeo Comfort and Driving Assistance - Oriane Siméoni @ valeo.ai
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Model code for Peekaboo"""
+import os
+import torch
+import torch.nn as nn
+import dino.vision_transformer as vits
+class PeekabooModel(nn.Module):
+    def __init__(
+        self,
+        vit_model="dino",
+        vit_arch="vit_small",
+        vit_patch_size=8,
+        enc_type_feats="k",
+    ):
+        super(PeekabooModel, self).__init__()
+        ########## Encoder ##########
+        self.vit_encoder, self.initial_dim, self.hook_features = get_vit_encoder(
+            vit_arch, vit_model, vit_patch_size, enc_type_feats
+        )
+        self.vit_patch_size = vit_patch_size
+        self.enc_type_feats = enc_type_feats
+        ########## Decoder ##########
+        self.previous_dim = self.initial_dim
+        self.decoder = nn.Conv2d(self.previous_dim, 1, (1, 1))
+    def _make_input_divisible(self, x: torch.Tensor) -> torch.Tensor:
+        # From selfmask
+        """Pad some pixels to make the input size divisible by the patch size."""
+        B, _, H_0, W_0 = x.shape
+        pad_w = (self.vit_patch_size - W_0 % self.vit_patch_size) % self.vit_patch_size
+        pad_h = (self.vit_patch_size - H_0 % self.vit_patch_size) % self.vit_patch_size
+        x = nn.functional.pad(x, (0, pad_w, 0, pad_h), value=0)
+        return x
+    def forward(self, batch, decoder=None, for_eval=False):
+        # Make the image divisible by the patch size
+        if for_eval:
+            batch = self._make_input_divisible(batch)
+            _w, _h = batch.shape[-2:]
+            _h, _w = _h // self.vit_patch_size, _w // self.vit_patch_size
+        else:
+            # Cropping used during training, could be changed to improve
+            w, h = (
+                batch.shape[-2] - batch.shape[-2] % self.vit_patch_size,
+                batch.shape[-1] - batch.shape[-1] % self.vit_patch_size,
+            )
+            batch = batch[:, :, :w, :h]
+        w_featmap = batch.shape[-2] // self.vit_patch_size
+        h_featmap = batch.shape[-1] // self.vit_patch_size
+        # Forward pass
+        with torch.no_grad():
+            # Encoder forward pass
+            att = self.vit_encoder.get_last_selfattention(batch)
+            # Get decoder features
+            feats = self.extract_feats(dims=att.shape, type_feats=self.enc_type_feats)
+            feats = feats[:, 1:, :, :].reshape(att.shape[0], w_featmap, h_featmap, -1)
+            feats = feats.permute(0, 3, 1, 2)
+        # Apply decoder
+        if decoder is None:
+            decoder = self.decoder
+        logits = decoder(feats)
+        return logits
+    @torch.no_grad()
+    def decoder_load_weights(self, weights_path):
+        print(f"Loading model from weights {weights_path}.")
+        # Load states
+        if torch.cuda.is_available():
+            state_dict = torch.load(weights_path)
+        else:
+            state_dict = torch.load(weights_path, map_location=torch.device("cpu"))
+        # Decoder
+        self.decoder.load_state_dict(state_dict["decoder"])
+        self.decoder.eval()
+        self.decoder.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
+    @torch.no_grad()
+    def decoder_save_weights(self, save_dir, n_iter):
+        state_dict = {}
+        state_dict["decoder"] = self.decoder.state_dict()
+        fname = os.path.join(save_dir, f"decoder_weights_niter{n_iter}.pt")
+        torch.save(state_dict, fname)
+        print(f"\n----" f"\nModel saved at {fname}")
+    @torch.no_grad()
+    def extract_feats(self, dims, type_feats="k"):
+        nb_im, nh, nb_tokens, _ = dims
+        qkv = (
+            self.hook_features["qkv"]
+            .reshape(nb_im, nb_tokens, 3, nh, -1 // nh)  # 3 corresponding to |qkv|
+            .permute(2, 0, 3, 1, 4)
+        )
+        q, k, v = qkv[0], qkv[1], qkv[2]
+        if type_feats == "q":
+            return q.transpose(1, 2).float()
+        elif type_feats == "k":
+            return k.transpose(1, 2).float()
+        elif type_feats == "v":
+            return v.transpose(1, 2).float()
+        else:
+            raise ValueError("Unknown features")
+def get_vit_encoder(vit_arch, vit_model, vit_patch_size, enc_type_feats):
+    if vit_arch == "vit_small" and vit_patch_size == 16:
+        url = "dino_deitsmall16_pretrain/dino_deitsmall16_pretrain.pth"
+        initial_dim = 384
+    elif vit_arch == "vit_small" and vit_patch_size == 8:
+        url = "dino_deitsmall8_300ep_pretrain/dino_deitsmall8_300ep_pretrain.pth"
+        initial_dim = 384
+    elif vit_arch == "vit_base" and vit_patch_size == 16:
+        if vit_model == "clip":
+            url = "5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt"
+        elif vit_model == "dino":
+            url = "dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth"
+        initial_dim = 768
+    elif vit_arch == "vit_base" and vit_patch_size == 8:
+        url = "dino_vitbase8_pretrain/dino_vitbase8_pretrain.pth"
+        initial_dim = 768
+    if vit_model == "dino":
+        vit_encoder = vits.__dict__[vit_arch](patch_size=vit_patch_size, num_classes=0)
+        # TODO change if want to have last layer not unfrozen
+        for p in vit_encoder.parameters():
+            p.requires_grad = False
+        vit_encoder.eval().to(
+            torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        )  # mode eval
+        state_dict = torch.hub.load_state_dict_from_url(
+            url="https://dl.fbaipublicfiles.com/dino/" + url
+        )
+        vit_encoder.load_state_dict(state_dict, strict=True)
+        hook_features = {}
+        if enc_type_feats in ["k", "q", "v", "qkv", "mlp"]:
+            # Define the hook
+            def hook_fn_forward_qkv(module, input, output):
+                hook_features["qkv"] = output
+            vit_encoder._modules["blocks"][-1]._modules["attn"]._modules[
+                "qkv"
+            ].register_forward_hook(hook_fn_forward_qkv)
+    else:
+        raise ValueError("Not implemented.")
+    return vit_encoder, initial_dim, hook_features

notebooks/exp.ipynb ADDED Viewed

	@@ -0,0 +1,434 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"hello\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import torch\n",
+    "import argparse\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "os.chdir(\"..\")\n",
+    "\n",
+    "from PIL import Image\n",
+    "from model import FoundModel\n",
+    "from misc import load_config\n",
+    "from torchvision import transforms as T\n",
+    "\n",
+    "NORMALIZE = T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "PATH_TO_IMG = \"./notebooks/0409.jpg\"\n",
+    "GT = \"./notebooks/0409.png\"\n",
+    "SCRIBBLE = \"./notebooks/11965.png\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "img = Image.open(PATH_TO_IMG)\n",
+    "img = img.convert(\"RGB\")\n",
+    "img"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scr = Image.open(GT)\n",
+    "scr = scr.convert(\"P\")\n",
+    "scr"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    from torchvision.transforms import InterpolationMode\n",
+    "\n",
+    "    BICUBIC = InterpolationMode.BICUBIC\n",
+    "except ImportError:\n",
+    "    BICUBIC = Image.BICUBIC\n",
+    "    \n",
+    "def _preprocess(img, img_size):\n",
+    "    transform = T.Compose(\n",
+    "        [\n",
+    "            T.Resize(img_size, BICUBIC),\n",
+    "            T.CenterCrop(img_size),\n",
+    "            T.ToTensor(),\n",
+    "            NORMALIZE\n",
+    "        ]\n",
+    "    )\n",
+    "    return transform(img)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "img_t = _preprocess(img, 224)#[None,:,:,:]\n",
+    "inputs = img_t.to(\"cuda\")\n",
+    "inputs.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scribble = scribble.to(\"cuda\")\n",
+    "scribble.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "m_i = inputs * scribble\n",
+    "m_i = m_i[None,:,:,:]\n",
+    "inputs = m_i.to(\"cuda\")\n",
+    "inputs.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datasets.utils import unnormalize\n",
+    "img_init = unnormalize(m_i)\n",
+    "img_init.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import cv2\n",
+    "import numpy as np \n",
+    "\n",
+    "ten =(img_init.permute(1,2,0).detach().cpu().numpy())\n",
+    "ten=(ten*255).astype(np.uint8)\n",
+    "#ten=cv2.cvtColor(ten,cv2.COLOR_RGB2BGR)\n",
+    "ten.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.imshow(ten)\n",
+    "plt.axis('off')\n",
+    "plt.savefig('masked_image.png', bbox_inches='tight', pad_inches=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "gt = Image.open(GT)\n",
+    "gt = gt.convert(\"P\")\n",
+    "gt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    from torchvision.transforms import InterpolationMode\n",
+    "\n",
+    "    BICUBIC = InterpolationMode.BICUBIC\n",
+    "except ImportError:\n",
+    "    BICUBIC = Image.BICUBIC\n",
+    "    \n",
+    "def _preprocess_scribble(img, img_size):\n",
+    "    transform = T.Compose(\n",
+    "        [\n",
+    "            T.Resize(img_size, BICUBIC),\n",
+    "            T.CenterCrop(img_size),\n",
+    "            T.ToTensor(),\n",
+    "        ]\n",
+    "    )\n",
+    "    return transform(img)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scribble = _preprocess_scribble(scr, 224)\n",
+    "#scribble = (scribble > 0).float()  # threshold to [0,1]\n",
+    "#scribble = torch.max(scribble) - scribble  # inverted scribble"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scribble.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import cv2\n",
+    "import numpy as np \n",
+    "\n",
+    "tens =(scribble.permute(1,2,0).detach().cpu().numpy())\n",
+    "tens=(tens*255).astype(np.uint8)\n",
+    "#ten=cv2.cvtColor(ten,cv2.COLOR_RGB2BGR)\n",
+    "tens.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.imshow(tens, cmap='gray')\n",
+    "plt.axis('off')\n",
+    "plt.savefig('gt.png', bbox_inches='tight', pad_inches=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "masked_img_t = img * scribble"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = FoundModel(vit_model=\"dino\",\n",
+    "                    vit_arch=\"vit_small\",\n",
+    "                    vit_patch_size=8,\n",
+    "                    enc_type_feats=\"k\",\n",
+    "                    bkg_type_feats=\"k\",\n",
+    "                    bkg_th=0.3)\n",
+    "\n",
+    "# Load weights\n",
+    "model.decoder_load_weights(\"./outputs/msl_a1.5_b1_g1_reg4-MSL-DUTS-TR-vit_small8/decoder_weights_niter500.pt\")\n",
+    "model.eval()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Forward step\n",
+    "with torch.no_grad():\n",
+    "    preds, _, shape_f, att = model.forward_step(inputs, for_eval=True)\n",
+    "\n",
+    "# Apply FOUND\n",
+    "sigmoid = nn.Sigmoid()\n",
+    "h, w = img_t.shape[-2:]\n",
+    "preds_up = F.interpolate(\n",
+    "    preds, scale_factor=model.vit_patch_size, mode=\"bilinear\", align_corners=False\n",
+    ")[..., :h, :w]\n",
+    "preds_up = (\n",
+    "    (sigmoid(preds_up.detach()) > 0.5).squeeze(0).float()\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.imshow(preds_up.cpu().squeeze().numpy(), cmap='gray')\n",
+    "plt.axis('off')\n",
+    "plt.savefig('masked_pred.png', bbox_inches='tight', pad_inches=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "preds_up.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def read_image(path):\n",
+    "    image = cv2.imread(path, -1)\n",
+    "    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n",
+    "    image = make_border(image)\n",
+    "    return image\n",
+    "\n",
+    "\n",
+    "def make_border(im):\n",
+    "    row, col = im.shape[:2]\n",
+    "    bottom = im[row-2:row, 0:col]\n",
+    "    mean = cv2.mean(bottom)[0]\n",
+    "    bordersize = 5\n",
+    "    border = cv2.copyMakeBorder(\n",
+    "        im,\n",
+    "        top=bordersize,\n",
+    "        bottom=bordersize,\n",
+    "        left=bordersize,\n",
+    "        right=bordersize,\n",
+    "        borderType=cv2.BORDER_CONSTANT,\n",
+    "        value=[0, 0, 0]\n",
+    "    )\n",
+    "    return border"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "img = read_image(\"./notebooks/scribble.png\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.imshow(img)\n",
+    "plt.axis('off')\n",
+    "plt.savefig('scribble.png', bbox_inches='tight', pad_inches=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "tarmak",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

notebooks/graphs.ipynb ADDED Viewed

	@@ -0,0 +1,249 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import sys\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns\n",
+    "import os \n",
+    "sns.set()\n",
+    "\n",
+    "%matplotlib inline\n",
+    "import warnings\n",
+    "warnings.filterwarnings('ignore')\n",
+    "\n",
+    "# https://abdalimran.github.io/2019-06-01/Drawing-multiple-ROC-Curves-in-a-single-plot"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#labels = ['Baseline', 'MaskSup']\n",
+    "labels = ['VOC07', 'VOC12', 'COCO20K']\n",
+    "\n",
+    "# VOC\n",
+    "auc = [71.7, 75.6, 62] # base\n",
+    "acc_nst = [72.7, 75.9, 64.0]\n",
+    "\n",
+    "# COCO\n",
+    "# auc = [54.2,36.0,48.4] # base\n",
+    "# acc_nst = [74.8,59.4,68.8]\n",
+    "\n",
+    "x = np.arange(len(labels))  # the label locations\n",
+    "dummy = np.arange(10)\n",
+    "\n",
+    "width = 0.35 #0.4  # the width of the bars\n",
+    "\n",
+    "\n",
+    "\n",
+    "fig, ax = plt.subplots()\n",
+    "\n",
+    "rects1 = ax.bar(x - width/2, auc, width, label='low masking', color='#E96479') # #FFAE6D\n",
+    "rects2 = ax.bar(x + width/2, acc_nst, width, label='high masking', color='#7DB9B6') # #9ED2C6\n",
+    "#rects211 = ax.bar(x + width/2 * 3.08, acc, width, label='CF1')\n",
+    "\n",
+    "#ax.set_ylabel('CorLoc (%)', fontsize=20)\n",
+    "#ax.set_title('Results')\n",
+    "ax.set_xticks(x)\n",
+    "ax.set_xticklabels(labels, rotation=0, fontsize=20)\n",
+    "\n",
+    "#for i in range(18):\n",
+    "#    ax.get_xticklabels()[i].set_color(\"white\")\n",
+    "\n",
+    "#ax.set_ylim([30,80]) # coc\n",
+    "ax.set_ylim([60,80]) # voc\n",
+    "\n",
+    "#ax.legend(loc=\"upper left\", prop={'size': 14})\n",
+    "ax.grid(True)\n",
+    "#ax.patch.set_facecolor('white')\n",
+    "\n",
+    "def autolabel(rects):\n",
+    "    \"\"\"Attach a text label above each bar in *rects*, displaying its height.\"\"\"\n",
+    "    for rect in rects:\n",
+    "        height = rect.get_height()\n",
+    "        ax.annotate('{:.1f}'.format(height),\n",
+    "                    xy=(rect.get_x() + rect.get_width() / 2, height),\n",
+    "                    xytext=(0, 3),  # 3 points vertical offset\n",
+    "                    textcoords=\"offset points\",\n",
+    "                    ha='center', va='bottom', rotation=0, fontsize=15)\n",
+    "        #ax.set_ylim(ymin=1)\n",
+    "        \n",
+    "\n",
+    "def autolabel_(rects):\n",
+    "    \"\"\"Attach a text label above each bar in *rects*, displaying its height.\"\"\"\n",
+    "    for rect in rects:\n",
+    "        height = rect.get_height()\n",
+    "        ax.annotate('{:.1f}'.format(height),\n",
+    "                    xy=(rect.get_x() + rect.get_width() / 2, height),\n",
+    "                    xytext=(0, 3),  # 3 points vertical offset\n",
+    "                    textcoords=\"offset points\",\n",
+    "                    ha='center', va='bottom', rotation=0, fontsize=15)\n",
+    "        #ax.set_ylim(ymin=1)\n",
+    "\n",
+    "\n",
+    "autolabel(rects1) # %\n",
+    "autolabel(rects2)\n",
+    "#autolabel_(rects211) # %\n",
+    "\n",
+    "fig.tight_layout()\n",
+    "fig.set_size_inches(12, 4, forward=True)\n",
+    "plt.title('Impact of masking (\\u2191)', loc='left', fontsize=25, color='gray', pad=12)\n",
+    "#plt.title('VOC2007 (\\u2191)', loc='left', fontsize=25, color='gray', pad=12)\n",
+    "plt.legend(loc='upper right', fontsize=18)\n",
+    "plt.savefig(\"../logs/masking_ablation.pdf\", bbox_inches='tight', pad_inches=0, dpi=300)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#labels = ['Baseline', 'MaskSup']\n",
+    "labels = ['VOC07', 'VOC12', 'COCO20K']\n",
+    "\n",
+    "# VOC\n",
+    "auc_b = [71.6, 75.2, 61.8] # base\n",
+    "auc = [72.2, 75.5, 62.3] # base\n",
+    "acc_nst = [72.7, 75.9, 64.0]\n",
+    "\n",
+    "# COCO\n",
+    "# auc = [54.2,36.0,48.4] # base\n",
+    "# acc_nst = [74.8,59.4,68.8]\n",
+    "\n",
+    "x = np.arange(len(labels))  # the label locations\n",
+    "dummy = np.arange(10)\n",
+    "\n",
+    "width = 0.25 #0.4  # the width of the bars\n",
+    "\n",
+    "\n",
+    "\n",
+    "fig, ax = plt.subplots()\n",
+    "\n",
+    "rects1 = ax.bar(x - width/2, auc_b, width, label='Baseline', color='#E96479') # #FFAE6D\n",
+    "rects2 = ax.bar(x + width/2, auc, width, label='w/ MFP', color='#7DB9B6') # #9ED2C6\n",
+    "rects211 = ax.bar(x + width/2 * 3.08, acc_nst, width, label='w/ MFP + PCL', color='#FFAE6D')\n",
+    "\n",
+    "ax.set_ylabel('CorLoc (%)', fontsize=20)\n",
+    "#ax.set_title('Results')\n",
+    "ax.set_xticks(x)\n",
+    "ax.set_xticklabels(labels, rotation=0, fontsize=20)\n",
+    "\n",
+    "#for i in range(18):\n",
+    "#    ax.get_xticklabels()[i].set_color(\"white\")\n",
+    "\n",
+    "#ax.set_ylim([30,80]) # coc\n",
+    "ax.set_ylim([60,80]) # voc\n",
+    "\n",
+    "#ax.legend(loc=\"upper left\", prop={'size': 14})\n",
+    "ax.grid(True)\n",
+    "#ax.patch.set_facecolor('white')\n",
+    "\n",
+    "def autolabel(rects):\n",
+    "    \"\"\"Attach a text label above each bar in *rects*, displaying its height.\"\"\"\n",
+    "    for rect in rects:\n",
+    "        height = rect.get_height()\n",
+    "        ax.annotate('{:.1f}'.format(height),\n",
+    "                    xy=(rect.get_x() + rect.get_width() / 2, height),\n",
+    "                    xytext=(0, 3),  # 3 points vertical offset\n",
+    "                    textcoords=\"offset points\",\n",
+    "                    ha='center', va='bottom', rotation=0, fontsize=15)\n",
+    "        #ax.set_ylim(ymin=1)\n",
+    "        \n",
+    "\n",
+    "def autolabel_(rects):\n",
+    "    \"\"\"Attach a text label above each bar in *rects*, displaying its height.\"\"\"\n",
+    "    for rect in rects:\n",
+    "        height = rect.get_height()\n",
+    "        ax.annotate('{:.1f}'.format(height),\n",
+    "                    xy=(rect.get_x() + rect.get_width() / 2, height),\n",
+    "                    xytext=(0, 3),  # 3 points vertical offset\n",
+    "                    textcoords=\"offset points\",\n",
+    "                    ha='center', va='bottom', rotation=0, fontsize=15)\n",
+    "        #ax.set_ylim(ymin=1)\n",
+    "\n",
+    "\n",
+    "autolabel(rects1) # %\n",
+    "autolabel(rects2)\n",
+    "autolabel_(rects211) # %\n",
+    "\n",
+    "fig.tight_layout()\n",
+    "fig.set_size_inches(12, 4, forward=True)\n",
+    "plt.title('Effectiveness of MFP and PCL (\\u2191)', loc='left', fontsize=25, color='gray', pad=12)\n",
+    "#plt.title('VOC2007 (\\u2191)', loc='left', fontsize=25, color='gray', pad=12)\n",
+    "plt.legend(loc='upper right', fontsize=18)\n",
+    "plt.savefig(\"../logs/msl_ablation.pdf\", bbox_inches='tight', pad_inches=0, dpi=300)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "bdstreets",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.17"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

notebooks/visualize.ipynb ADDED Viewed

	@@ -0,0 +1,571 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os,sys,inspect\n",
+    "sys.path.insert(0,\"..\")\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "from matplotlib import rc\n",
+    "import glob\n",
+    "\n",
+    "macos = False\n",
+    "if macos == True:\n",
+    "    rc('font',**{'family':'sans-serif','sans-serif':['Computer Modern Roman']})\n",
+    "    rc('text', usetex=True)\n",
+    "\n",
+    "# Font Size\n",
+    "import matplotlib\n",
+    "font = {'family' : 'DejaVu Sans',\n",
+    "        'weight' : 'bold',\n",
+    "        'size'   : 30}\n",
+    "\n",
+    "import cv2\n",
+    "import numpy as np\n",
+    "import string\n",
+    "import random"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def visualize(idx, **images):\n",
+    "    \"\"\"Plot images in one row.\"\"\" \n",
+    "    n = len(images)\n",
+    "    fig = plt.figure(figsize=(60, 40))\n",
+    "    for i, (name, image) in enumerate(images.items()):\n",
+    "        plt.subplot(1, n, i + 1)\n",
+    "        plt.xticks([])\n",
+    "        plt.yticks([])\n",
+    "        #if idx==0:\n",
+    "        plt.title(' '.join(name.split('_')).lower(), fontsize=40)\n",
+    "        if i ==0:\n",
+    "            w,h = (1,25)\n",
+    "            fs = 1.0\n",
+    "            color = (0,0,0)\n",
+    "            #color = (255,255,255)\n",
+    "            font = cv2.FONT_HERSHEY_SIMPLEX #FONT_HERSHEY_DUPLEX  #press tab for different operations\n",
+    "            cv2.putText(image, str(idx), (w,h), font, fs, color, 1, cv2.LINE_AA)\n",
+    "        if i !=0:\n",
+    "            #plt.imshow(image[:,:,0], cmap='magma')\n",
+    "            plt.imshow(image, cmap='gray')\n",
+    "        else:\n",
+    "            plt.imshow(image, cmap='gray')\n",
+    "        plt.axis(\"off\")\n",
+    "        #plt.tight_layout()\n",
+    "    plt.savefig(\"../outputs/visualizations/duts-te/compare-preds/{}.png\".format(idx), facecolor=\"white\", bbox_inches = 'tight')\n",
+    "    plt.show()\n",
+    "    \n",
+    "    \n",
+    "def make_dataset(dir):\n",
+    "    images = []\n",
+    "    assert os.path.isdir(dir), '%s is not a valid directory' % dir\n",
+    "\n",
+    "    f = dir.split('/')[-1].split('_')[-1]\n",
+    "    #print (dir, f)\n",
+    "    dirs= os.listdir(dir)\n",
+    "    for img in dirs:\n",
+    "\n",
+    "        path = os.path.join(dir, img)\n",
+    "        #print(path)\n",
+    "        images.append(path)\n",
+    "    return images\n",
+    "\n",
+    "# def make_dataset(dir):\n",
+    "#     images = []\n",
+    "#     assert os.path.isdir(dir), '%s is not a valid directory' % dir\n",
+    "\n",
+    "#     # f = dir.split('/')[-1].split('_')[-1]\n",
+    "#     # #print (dir, f)\n",
+    "#     # dirs= os.listdir(dir)\n",
+    "#     # for img in dirs:\n",
+    "\n",
+    "#     #     path = os.path.join(dir, img)\n",
+    "#     #     #print(path)\n",
+    "#     #     images.append(path)\n",
+    "#     images = natsorted(glob.glob(dir+ \"/\" + \"/*.png\"))\n",
+    "#     return images\n",
+    "\n",
+    "def read_image(path):\n",
+    "    image = cv2.imread(path, -1)\n",
+    "    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n",
+    "    image = Image.fromarray(np.uint8(image)).convert('RGB')\n",
+    "    image = resize_center_crop(image)\n",
+    "    image = make_border(image)\n",
+    "    return image\n",
+    "\n",
+    "\n",
+    "def make_border(im):\n",
+    "    row, col = im.shape[:2]\n",
+    "    bottom = im[row-2:row, 0:col]\n",
+    "    mean = cv2.mean(bottom)[0]\n",
+    "    bordersize = 5\n",
+    "    border = cv2.copyMakeBorder(\n",
+    "        im,\n",
+    "        top=bordersize,\n",
+    "        bottom=bordersize,\n",
+    "        left=bordersize,\n",
+    "        right=bordersize,\n",
+    "        borderType=cv2.BORDER_CONSTANT,\n",
+    "        value=[0, 0, 0]\n",
+    "    )\n",
+    "    return border\n",
+    "\n",
+    "from PIL import Image\n",
+    "from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize\n",
+    "\n",
+    "try:\n",
+    "    from torchvision.transforms import InterpolationMode\n",
+    "\n",
+    "    BICUBIC = InterpolationMode.BICUBIC\n",
+    "except ImportError:\n",
+    "    BICUBIC = Image.BICUBIC\n",
+    "\n",
+    "\n",
+    "def _convert_image_to_rgb(image):\n",
+    "    return image.convert(\"RGB\")\n",
+    "\n",
+    "\n",
+    "def resize_center_crop(img):\n",
+    "    \"\"\" \n",
+    "    Load and resize an image to a desired size.\n",
+    "\n",
+    "    Arguments:\n",
+    "        img (PIL image): Image to load and resize\n",
+    "\n",
+    "    Returns:\n",
+    "        img (np.array): Resized and cropped image\n",
+    "\n",
+    "    Examples:\n",
+    "        >>> img = resize_center_crop(img)\n",
+    "    \"\"\"\n",
+    "\n",
+    "    if type(img) == str:\n",
+    "        img = Image.open(img)\n",
+    "\n",
+    "    transform = Compose(\n",
+    "        [\n",
+    "            Resize(224, BICUBIC),\n",
+    "            CenterCrop(224),\n",
+    "            _convert_image_to_rgb,\n",
+    "            # ToTensor(),\n",
+    "            # Normalize(\n",
+    "            #     (0.5, 0.5, 0.5),\n",
+    "            #     (0.5, 0.5, 0.5),\n",
+    "            # ),\n",
+    "        ]\n",
+    "    )\n",
+    "    img = transform(img)\n",
+    "    img = np.array(img)\n",
+    "    return img\n",
+    "\n",
+    "def read_image_(path):\n",
+    "    image = cv2.imread(path, -1)\n",
+    "    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n",
+    "    image = cv2.resize(image, (192, 256))\n",
+    "    return image"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# # ECSSD\n",
+    "\n",
+    "# # Images and GT\n",
+    "\n",
+    "# GT = \"../outputs/visualizations/ecssd/gts\"\n",
+    "# IMG = \"../datasets_local/ECSSD/images/\"\n",
+    "# GTS = [os.path.join(GT, x) for x in os.listdir(GT)]\n",
+    "# IMGS = [os.path.join(IMG, x) for x in os.listdir(IMG)]\n",
+    "\n",
+    "# # Algo\n",
+    "# algo1 = \"../outputs/visualizations/ecssd/found-MSL-DUTS-TR-vit_small8_ECSSD/\"\n",
+    "# ours = \"../outputs/visualizations/ecssd/msl_a1.5_b1_g1_reg4-MSL-DUTS-TR-vit_small8_ECSSD/\"\n",
+    "\n",
+    "# algo1 = [os.path.join(algo1, x) for x in os.listdir(algo1)]\n",
+    "# ours = [os.path.join(ours, x) for x in os.listdir(ours)]\n",
+    "\n",
+    "# print(len(GTS), len(IMGS))\n",
+    "# print(ours[:3])\n",
+    "\n",
+    "# i = 0\n",
+    "# for num in range(len(IMGS)):\n",
+    "#     visualize(i, \n",
+    "#               image=read_image(IMGS[num]),\n",
+    "#               found_method=read_image(algo1[num]),\n",
+    "#               our_method=read_image(ours[num]),\n",
+    "#               gt=read_image(GTS[num]))\n",
+    "#     i+=1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# # DUT_OMRON\n",
+    "\n",
+    "# # Images and GT\n",
+    "\n",
+    "# GT = \"../outputs/visualizations/dut-omron/gts\"\n",
+    "# IMG = \"../datasets_local/DUT-OMRON/DUT-OMRON-image/\"\n",
+    "# GTS = [os.path.join(GT, x) for x in os.listdir(GT)]\n",
+    "# IMGS = [os.path.join(IMG, x) for x in os.listdir(IMG)]\n",
+    "\n",
+    "# # Algo\n",
+    "# algo1 = \"../outputs/visualizations/dut-omron/found-MSL-DUTS-TR-vit_small8_DUT-OMRON/\"\n",
+    "# ours = \"../outputs/visualizations/dut-omron/msl_a1.5_b1_g1_reg4-MSL-DUTS-TR-vit_small8_DUT-OMRON/\"\n",
+    "\n",
+    "# algo1 = [os.path.join(algo1, x) for x in os.listdir(algo1)]\n",
+    "# ours = [os.path.join(ours, x) for x in os.listdir(ours)]\n",
+    "\n",
+    "# print(len(GTS), len(IMGS))\n",
+    "# print(ours[:3])\n",
+    "\n",
+    "# i = 0\n",
+    "# for num in range(len(IMGS)):\n",
+    "#     visualize(i, \n",
+    "#               image=read_image(IMGS[num]),\n",
+    "#               found_method=read_image(algo1[num]),\n",
+    "#               our_method=read_image(ours[num]),\n",
+    "#               gt=read_image(GTS[num]))\n",
+    "#     i+=1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# # DUT-TE\n",
+    "\n",
+    "# # Images and GT\n",
+    "\n",
+    "# GT = \"../outputs/visualizations/duts-te/gts\"\n",
+    "# IMG = \"../datasets_local/DUTS-TE/DUTS-TE-Image/\"\n",
+    "# GTS = [os.path.join(GT, x) for x in os.listdir(GT)]\n",
+    "# IMGS = [os.path.join(IMG, x) for x in os.listdir(IMG)]\n",
+    "\n",
+    "# # Algo\n",
+    "# algo1 = \"../outputs/visualizations/duts-te/found-MSL-DUTS-TR-vit_small8_DUTS-TE/\"\n",
+    "# ours = \"../outputs/visualizations/duts-te/msl_a1.5_b1_g1_reg4-MSL-DUTS-TR-vit_small8_DUTS-TE/\"\n",
+    "\n",
+    "# algo1 = [os.path.join(algo1, x) for x in os.listdir(algo1)]\n",
+    "# ours = [os.path.join(ours, x) for x in os.listdir(ours)]\n",
+    "\n",
+    "# print(len(GTS), len(IMGS))\n",
+    "# print(ours[:3])\n",
+    "\n",
+    "# i = 0\n",
+    "# for num in range(len(IMGS)):\n",
+    "#     visualize(i, \n",
+    "#               image=read_image(IMGS[num]),\n",
+    "#               found_method=read_image(algo1[num]),\n",
+    "#               our_method=read_image(ours[num]),\n",
+    "#               gt=read_image(GTS[num]))\n",
+    "#     i+=1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# GT\n",
+    "ECSS_GT = \"../outputs/visualizations/ecssd/gts\"\n",
+    "ECSS_IMG = \"../datasets_local/ECSSD/images/\"\n",
+    "ECSS_GTS = [os.path.join(ECSS_GT, x) for x in os.listdir(ECSS_GT)]\n",
+    "ECSS_IMGS = [os.path.join(ECSS_IMG, x) for x in os.listdir(ECSS_IMG)]\n",
+    "# Pred\n",
+    "ECSS_algo1 = \"../outputs/visualizations/ecssd/found-MSL-DUTS-TR-vit_small8_ECSSD/\"\n",
+    "ECSS_ours = \"../outputs/visualizations/ecssd/msl_a1.5_b1_g1_reg4-MSL-DUTS-TR-vit_small8_ECSSD/\"\n",
+    "ECSS_algo1 = [os.path.join(ECSS_algo1, x) for x in os.listdir(ECSS_algo1)]\n",
+    "ECSS_ours = [os.path.join(ECSS_ours, x) for x in os.listdir(ECSS_ours)]\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# GT\n",
+    "DUT_OM_GT = \"../outputs/visualizations/dut-omron/gts\"\n",
+    "DUT_OM_IMG = \"../datasets_local/DUT-OMRON/DUT-OMRON-image/\"\n",
+    "DUT_OM_GTS = [os.path.join(DUT_OM_GT, x) for x in os.listdir(DUT_OM_GT)]\n",
+    "DUT_OM_IMGS = [os.path.join(DUT_OM_IMG, x) for x in os.listdir(DUT_OM_IMG)]\n",
+    "\n",
+    "# Pred\n",
+    "DUT_OM_algo1 = \"../outputs/visualizations/dut-omron/found-MSL-DUTS-TR-vit_small8_DUT-OMRON/\"\n",
+    "DUT_OM_ours = \"../outputs/visualizations/dut-omron/msl_a1.5_b1_g1_reg4-MSL-DUTS-TR-vit_small8_DUT-OMRON/\"\n",
+    "DUT_OM_algo1 = [os.path.join(DUT_OM_algo1, x) for x in os.listdir(DUT_OM_algo1)]\n",
+    "DUT_OM_ours = [os.path.join(DUT_OM_ours, x) for x in os.listdir(DUT_OM_ours)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "DUT_GT = \"../outputs/visualizations/duts-te/gts\"\n",
+    "DUT_IMG = \"../datasets_local/DUTS-TE/DUTS-TE-Image/\"\n",
+    "DUT_GTS = [os.path.join(DUT_GT, x) for x in os.listdir(DUT_GT)]\n",
+    "DUT_IMGS = [os.path.join(DUT_IMG, x) for x in os.listdir(DUT_IMG)]\n",
+    "\n",
+    "# Pred\n",
+    "DUT_algo1 = \"../outputs/visualizations/duts-te/found-MSL-DUTS-TR-vit_small8_DUTS-TE/\"\n",
+    "DUT_ours = \"../outputs/visualizations/duts-te/msl_a1.5_b1_g1_reg4-MSL-DUTS-TR-vit_small8_DUTS-TE/\"\n",
+    "DUT_algo1 = [os.path.join(DUT_algo1, x) for x in os.listdir(DUT_algo1)]\n",
+    "DUT_ours = [os.path.join(DUT_ours, x) for x in os.listdir(DUT_ours)]\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# ECSSD -\n",
+    "# \t52, 132, 147 - over segmentation, fine details\n",
+    "# \t353, 658, 780 - reflection of shiny surface and water\n",
+    "# 432, 825, 835, 988 - noisy \n",
+    "# 59 (bee) - complex background\n",
+    "\n",
+    "# DUT-OMRON\n",
+    "# \t1, 14 - over segmentation\n",
+    "# \t119, 365, 439, 440, 1238 - noisy\n",
+    "# 1168, 1461 - segment other non-salient objects/parts\n",
+    "# 1388 - fails in complex background\n",
+    "# 1398 - small objects\n",
+    "# 1973 - dark scenes\n",
+    "\n",
+    "# DUTS-TE\n",
+    "# \t46, 698, 1712 - segment other non-salient objects/parts\n",
+    "# \t260 - small objects \n",
+    "# 776, 1255 - over segmentation\n",
+    "# \t683, 830, 1465 - noisy\n",
+    "# \t719, 1470 - reflection of water\n",
+    "\n",
+    "# 52, 132, 147, 353, 658, 780,  - oversegment, reflection of shiny surface and water\n",
+    "# 1388, 1398, 1972, 1168, 1461, 440 - fails in complex background, small objects, dark scenes, segment non-salient objects, noisy\n",
+    "# 260, 719, 1470, 683, 830, 1465 - small objects, reflection of water, noisy predictions\n",
+    "\n",
+    "# idxs = [52, 59, 147, 353, 658, 780, 1388, 1398, 1973, 1168, 1461, 440, 260, 719, 1470, 683, 830, 1465]\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "# ECSSD -\n",
+    "# , 132, - over segmentation, fine details\n",
+    "# ,,  - reflection of shiny surface and water\n",
+    "# 432, 825, 835, 988 - noisy \n",
+    "# 59 (bee) - complex background\n",
+    "\n",
+    "# DUT-OMRON\n",
+    "# \t1, 14 - over segmentation\n",
+    "# \t119, 365, 439,, 1238 - noisy\n",
+    "#,  - segment other non-salient objects/parts\n",
+    "# - fails in complex background\n",
+    "# - small objects\n",
+    "# - dark scenes\n",
+    "\n",
+    "# DUTS-TE\n",
+    "# \t46, 698, 1712 - segment other non-salient objects/parts\n",
+    "#  - small objects \n",
+    "# 776, 1255 - over segmentation\n",
+    "# ,, - noisy\n",
+    "# , - reflection of water\n",
+    "\n",
+    "idxs = [132,432,825,835,988,59,1,14,119,365,439,1238,46,698,1712,776,1255,4000]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rows = int(len(idxs) / 3)\n",
+    "rows, len(idxs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rows = int(len(idxs) / 3)\n",
+    "cols = 12\n",
+    "fig, axarr = plt.subplots(rows, cols, figsize=(30, 15), constrained_layout=True)\n",
+    "\n",
+    "\n",
+    "alphabet_string = string.ascii_lowercase\n",
+    "alphabet_list = list(alphabet_string)\n",
+    "\n",
+    "v = 0\n",
+    "for r in range(rows):\n",
+    "    if r == 0 or r == 1:\n",
+    "        print(v, r)\n",
+    "        a=read_image(ECSS_IMGS[idxs[v+r]])\n",
+    "        b=read_image(ECSS_algo1[idxs[v+r]])\n",
+    "\n",
+    "        c=read_image(ECSS_ours[idxs[v+r]])\n",
+    "        d=read_image(ECSS_GTS[idxs[v+r]])\n",
+    "\n",
+    "        e=read_image(ECSS_IMGS[idxs[v+r+1]])\n",
+    "        f=read_image(ECSS_algo1[idxs[v+r+1]])\n",
+    "\n",
+    "        g=read_image(ECSS_ours[idxs[v+r+1]])\n",
+    "        h=read_image(ECSS_GTS[idxs[v+r+1]])\n",
+    "\n",
+    "        i=read_image(ECSS_IMGS[idxs[v+r+2]])\n",
+    "        j=read_image(ECSS_algo1[idxs[v+r+2]])\n",
+    "\n",
+    "        k=read_image(ECSS_ours[idxs[v+r+2]])\n",
+    "        l=read_image(ECSS_GTS[idxs[v+r+2]])\n",
+    "\n",
+    "    if r == 2 or r == 3:\n",
+    "        print(v, r)\n",
+    "        a=read_image(DUT_OM_IMGS[idxs[v+r]])\n",
+    "        b=read_image(DUT_OM_algo1[idxs[v+r]])\n",
+    "\n",
+    "        c=read_image(DUT_OM_ours[idxs[v+r]])\n",
+    "        d=read_image(DUT_OM_GTS[idxs[v+r]])\n",
+    "\n",
+    "        e=read_image(DUT_OM_IMGS[idxs[v+r+1]])\n",
+    "        f=read_image(DUT_OM_algo1[idxs[v+r+1]])\n",
+    "\n",
+    "        g=read_image(DUT_OM_ours[idxs[v+r+1]])\n",
+    "        h=read_image(DUT_OM_GTS[idxs[v+r+1]])\n",
+    "\n",
+    "        i=read_image(DUT_OM_IMGS[idxs[v+r+2]])\n",
+    "        j=read_image(DUT_OM_algo1[idxs[v+r+2]])\n",
+    "\n",
+    "        k=read_image(DUT_OM_ours[idxs[v+r+2]])\n",
+    "        l=read_image(DUT_OM_algo1[idxs[v+r+2]])\n",
+    "\n",
+    "    if r == 4 or r == 5:\n",
+    "        print(v, r)\n",
+    "        a=read_image(DUT_IMGS[idxs[v+r]])\n",
+    "        b=read_image(DUT_algo1[idxs[v+r]])\n",
+    "\n",
+    "        c=read_image(DUT_ours[idxs[v+r]])\n",
+    "        d=read_image(DUT_GTS[idxs[v+r]])\n",
+    "\n",
+    "        e=read_image(DUT_IMGS[idxs[v+r+1]])\n",
+    "        f=read_image(DUT_algo1[idxs[v+r+1]])\n",
+    "\n",
+    "        g=read_image(DUT_ours[idxs[v+r+1]])\n",
+    "        h=read_image(DUT_GTS[idxs[v+r+1]])\n",
+    "\n",
+    "        i=read_image(DUT_IMGS[idxs[v+r+2]])\n",
+    "        j=read_image(DUT_algo1[idxs[v+r+2]])\n",
+    "\n",
+    "        k=read_image(DUT_ours[idxs[v+r+2]])\n",
+    "        l=read_image(DUT_GTS[idxs[v+r+2]])\n",
+    "\n",
+    "    v+=2\n",
+    "    \n",
+    "    images = [a,b,c,d,e,f,g,h,i,j,k,l]\n",
+    "    \n",
+    "    captions = [\"Image\", \"FOUND\", \"Ours\", \"Ground Truth\", \n",
+    "                \"Image\", \"FOUND\", \"Ours\", \"Ground Truth\",\n",
+    "                \"Image\", \"FOUND\", \"Ours\", \"Ground Truth\"]\n",
+    "    \n",
+    "    for c in range(cols):\n",
+    "        axarr[r, c].imshow(images[c], cmap='gray')\n",
+    "        axarr[r, c].axis(\"off\")\n",
+    "        axarr[r, c].set_aspect('equal') \n",
+    "        if r==0:\n",
+    "            axarr[r, c].set_title(captions[c], fontsize=25)\n",
+    "\n",
+    "plt.savefig(\"../logs/compare_predictions_ext.pdf\", facecolor=\"white\", bbox_inches = 'tight', dpi=300)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "stacks  = np.hstack([a,b,c_])\n",
+    "stacks.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plt.imshow(stacks)\n",
+    "plt.axis(\"off\")\n",
+    "plt.savefig(\"../logs/failures.pdf\", facecolor=\"white\", bbox_inches = 'tight', dpi=300)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a.shape, b.shape, c.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "uobjl",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

outputs/VOC_000030-peekaboo.png ADDED Viewed