Spaces:

jho
/

vox_saliency_demo

Build error

App Files Files Community

ohjho commited on Jun 1, 2022

Commit

70f6db8

•

1 Parent(s): b99a21f

added st app for testing

Browse files

Files changed (5) hide show

README.md +96 -1
app.py +101 -0
download.py +475 -0
requirements.txt +6 -0
run_gradio.py +126 -0

README.md CHANGED Viewed

@@ -8,5 +8,100 @@ sdk_version: 1.9.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference

 app_file: app.py
 pinned: false
 ---
+# Saliency Based Image Cropping
+This repo was forked by the [Miro team](https://miro.io/#) to create the interface [here]()
+# Contextual Encoder-Decoder Network <br/> for Visual Saliency Prediction
+![](https://img.shields.io/badge/python-v3.6.8-orange.svg?style=flat-square)
+![](https://img.shields.io/badge/tensorflow-v1.13.1-orange.svg?style=flat-square)
+![](https://img.shields.io/badge/matplotlib-v3.0.3-orange.svg?style=flat-square)
+![](https://img.shields.io/badge/requests-v2.21.0-orange.svg?style=flat-square)
+<img src="./figures/results.jpg" width="800"/>
+This repository contains the official *TensorFlow* implementation of the MSI-Net (multi-scale information network), as described in the Neural Networks paper [Contextual encoder-decoder network for visual saliency prediction](https://www.sciencedirect.com/science/article/pii/S0893608020301660) (2020) and on [arXiv](https://arxiv.org/abs/1902.06634).
+**_Abstract:_** *Predicting salient regions in natural images requires the detection of objects that are present in a scene. To develop robust representations for this challenging task, high-level visual features at multiple spatial scales must be extracted and augmented with contextual information. However, existing models aimed at explaining human fixation maps do not incorporate such a mechanism explicitly. Here we propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task. The architecture forms an encoder-decoder structure and includes a module with multiple convolutional layers at different dilation rates to capture multi-scale features in parallel. Moreover, we combine the resulting representations with global scene information for accurately predicting visual saliency. Our model achieves competitive and consistent results across multiple evaluation metrics on two public saliency benchmarks and we demonstrate the effectiveness of the suggested approach on five datasets and selected examples. Compared to state of the art approaches, the network is based on a lightweight image classification backbone and hence presents a suitable choice for applications with limited computational resources, such as (virtual) robotic systems, to estimate human fixations across complex natural scenes.*
+Our results are available on the original [MIT saliency benchmark](http://saliency.mit.edu/results.html) and the updated [MIT/Tübingen saliency benchmark](https://saliency.tuebingen.ai/results.html). The latter are derived from a probabilistic version of our model with metric-specific postprocessing for a fair model comparison.
+## Reference
+If you use this code in your research, please cite the following paper:
+```
+@article{kroner2020contextual,
+  title={Contextual encoder-decoder network for visual saliency prediction},
+  author={Kroner, Alexander and Senden, Mario and Driessens, Kurt and Goebel, Rainer},
+  url={http://www.sciencedirect.com/science/article/pii/S0893608020301660},
+  doi={https://doi.org/10.1016/j.neunet.2020.05.004},
+  journal={Neural Networks},
+  publisher={Elsevier},
+  year={2020},
+  volume={129},
+  pages={261--270},
+  issn={0893-6080}
+}
+```
+## Architecture
+<img src="./figures/architecture.jpg" width="700"/>
+## Requirements
+| Package    | Version |
+|:----------:|:-------:|
+| python     | 3.6.8   |
+| tensorflow | 1.13.1  |
+| matplotlib | 3.0.3   |
+| requests   | 2.21.0  |
+| scipy      | 1.4.1   |
+The code was tested and is compatible with both Windows and Linux. We strongly recommend to use *TensorFlow* with GPU acceleration, especially when training the model. Nevertheless, a slower CPU version is officially supported.
+## Training
+The results of our paper can be reproduced by first training the MSI-Net via the following command:
+```
+python main.py train
+```
+This will start the training procedure for the SALICON dataset with the hyperparameters defined in `config.py`. If you want to optimize the model for CPU usage, please change the corresponding `device` value in the configurations file. Optionally, the dataset and download path can be specified via command line arguments:
+```
+python main.py train -d DATA -p PATH
+```
+Here, the `DATA` argument must be `salicon`, `mit1003`, `cat2000`, `dutomron`, `pascals`, `osie`, or `fiwi`. It is required that the model is first trained on the SALICON dataset before fine-tuning it on any of the other ones. By default, the selected saliency dataset will be downloaded to the folder `data/` but you can point to a different directory via the `PATH` argument.
+All results are then stored under the folder `results/`, which contains the training history and model checkpoints. This allows to continue training or perform inference on test instances, as described in the next section.
+## Testing
+To test a pre-trained model on image data and produce saliency maps, execute the following command:
+```
+python main.py test -d DATA -p PATH
+```
+If no checkpoint is available from prior training, it will automatically download our pre-trained model to `weights/`. The `DATA` argument defines which network is used and must be `salicon`, `mit1003`, `cat2000`, `dutomron`, `pascals`, `osie`, or `fiwi`. It will then resize the input images to the dimensions specified in the configurations file. Note that this might lead to excessive image padding depending on the selected dataset.
+The `PATH` argument points to the folder where the test data is stored but can also denote a single image file directly. As for network training, the `device` value can be changed to CPU in the configurations file. This ensures that the model optimized for CPU will be utilized and hence improves the inference speed. All results are finally stored in the folder `results/images/` with the original image dimensions.
+## Demo
+<img src="./demo/demo.gif" width="750"/>
+A demonstration of saliency prediction in the browser is available [here](https://storage.googleapis.com/msi-net/demo/index.html). It computes saliency maps based on the input from a webcam via *TensorFlow.js*. Since the library uses the machine's hardware, model performance is dependent on your local configuration. The buttons allow you to select the quality, ranging from *very low* for a version trained on low image resolution with high inference speed, to *very high* for a version trained on high image resolution with slow inference speed.
+## Contact
+For questions, bug reports, and suggestions about this work, please create an [issue](https://github.com/alexanderkroner/saliency/issues) in this repository.

app.py ADDED Viewed

	@@ -0,0 +1,101 @@

+import streamlit as st
+import os, sys, io
+import urllib.request as urllib
+import numpy as np
+from PIL import Image
+from run_gradio import load_model, test_model
+### Some Utils Functions ###
+def get_image(st_asset = st.sidebar, as_np_arr = False, extension_list = ['jpg', 'jpeg', 'png']):
+	image_url, image_fh = None, None
+	if st_asset.checkbox('use image URL?'):
+		image_url = st_asset.text_input("Enter Image URL")
+	else:
+		image_fh = st_asset.file_uploader(label = "Update your image", type = extension_list)
+	im = None
+	if image_url:
+		response = urllib.urlopen(image_url)
+		im = Image.open(io.BytesIO(bytearray(response.read())))
+	elif image_fh:
+		im = Image.open(image_fh)
+	if im and as_np_arr:
+		im = np.array(im)
+	return im
+def show_miro_logo(use_column_width = False, width = 100, st_asset= st.sidebar):
+	logo_url = 'https://miro.medium.com/max/1400/0*qLL-32srlq6Y_iTm.png'
+	st_asset.image(logo_url, use_column_width = use_column_width, channels = 'BGR', output_format = 'PNG', width = width)
+def im_draw_bbox(pil_im, x0, y0, x1, y1, color = 'black', width = 3, caption = None,
+			bbv_label_only = False):
+	'''
+	draw bounding box on the input image pil_im in-place
+	Args:
+		color: color name as read by Pillow.ImageColor
+		use_bbv: use bbox_visualizer
+	'''
+	import bbox_visualizer as bbv
+	if any([type(i)== float for i in [x0,y0,x1,y1]]):
+		warnings.warn(f'im_draw_bbox: at least one of x0,y0,x1,y1 is of the type float and is converted to int.')
+		x0 = int(x0)
+		y0 = int(y0)
+		x1 = int(x1)
+		y1 = int(y1)
+	if bbv_label_only:
+		if caption:
+			im_array = bbv.draw_flag_with_label(np.array(pil_im),
+						label = caption,
+						bbox = [x0,y0,x1,y1],
+						line_color = ImageColor.getrgb(color),
+						text_bg_color = ImageColor.getrgb(color)
+						)
+		else:
+			raise ValueError(f'im_draw_bbox: bbv_label_only is True but caption is None')
+	else:
+		im_array = bbv.draw_rectangle(np.array(pil_im),
+					bbox = [x0, y0, x1, y1],
+					bbox_color = ImageColor.getrgb(color),
+					thickness = width
+					)
+		im_array = bbv.add_label(
+					im_array, label = caption,
+					bbox = [x0,y0,x1,y1],
+					text_bg_color = ImageColor.getrgb(color)
+					)if caption else im_array
+	return Image.fromarray(im_array)
+### Streamlit App ###
+def Main(model_dict):
+	st.set_page_config(layout = 'wide')
+	show_miro_logo()
+	with st.sidebar.expander('Saliency Demo'):
+		st.info(f'''
+		[TensorFlow Implementation of MSI-Net](https://github.com/alexanderkroner/saliency)
+		which archived
+		[SoTA performance](https://saliency.tuebingen.ai/results.html) on the
+		[MIT Saliency Benchmark dataset](http://saliency.mit.edu/datasets.html)
+		''')
+	im = get_image(st_asset = st.sidebar.expander('Input Image', expanded = True), extension_list = ['jpg','jpeg'])
+	aspect_ratio = st.sidebar.selectbox('aspect ratio', help = 'to demo saliency cropping',
+					options = ['','16x9','4x3'])
+	if im:
+		aspect_ratio_tup = tuple([int(i) for i in aspect_ratio.split('x')]) if aspect_ratio else None
+		saliency_im = test_model(np.array(im), model_dict = model_dict,
+						aspect_ratio_tup = aspect_ratio_tup)
+		l_col, r_col = st.columns(2)
+		l_col.image(im, caption = 'Input Image')
+		r_col.image(saliency_im, caption = 'Saliency Map')
+	else:
+		st.warning(f':point_left: please provide an image')
+if __name__ == '__main__':
+	model_dict = load_model()
+	Main(model_dict = model_dict)

download.py ADDED Viewed

	@@ -0,0 +1,475 @@

+import io
+import os
+import zipfile
+import gdown
+import h5py
+import numpy as np
+import requests
+from matplotlib.pyplot import imread, imsave
+from scipy.io import loadmat
+from scipy.ndimage import gaussian_filter
+def download_salicon(data_path):
+    """Downloads the SALICON dataset. Three folders are then created that
+       contain the stimuli, binary fixation maps, and blurred saliency
+       distributions respectively.
+    Args:
+        data_path (str): Defines the path where the dataset will be
+                         downloaded and extracted to.
+    .. seealso:: The code for downloading files from google drive is based
+                 on the solution provided at [https://bit.ly/2JSVgMQ].
+    """
+    print(">> Downloading SALICON dataset...", end="", flush=True)
+    default_path = data_path + "salicon/"
+    fixations_path = default_path + "fixations/"
+    saliency_path = default_path + "saliency/"
+    os.makedirs(fixations_path, exist_ok=True)
+    os.makedirs(saliency_path, exist_ok=True)
+    ids = ["1g8j-hTT-51IG1UFwP0xTGhLdgIUCW5e5",
+           "1P-jeZXCsjoKO79OhFUgnj6FGcyvmLDPj",
+           "1PnO7szbdub1559LfjYHMy65EDC4VhJC8"]
+    urls = ["https://drive.google.com/uc?id=" +
+            i + "&export=download" for i in ids]
+    save_paths = [default_path, fixations_path, saliency_path]
+    for count, url in enumerate(urls):
+        gdown.download(url, data_path + "tmp.zip", quiet=True)
+        with zipfile.ZipFile(data_path + "tmp.zip", "r") as zip_ref:
+            for file in zip_ref.namelist():
+                if "test" not in file:
+                    zip_ref.extract(file, save_paths[count])
+    os.rename(default_path + "images", default_path + "stimuli")
+    os.remove(data_path + "tmp.zip")
+    print("done!", flush=True)
+def download_mit1003(data_path):
+    """Downloads the MIT1003 dataset. Three folders are then created that
+       contain the stimuli, binary fixation maps, and blurred saliency
+       distributions respectively.
+    Args:
+        data_path (str): Defines the path where the dataset will be
+                         downloaded and extracted to.
+    """
+    print(">> Downloading MIT1003 dataset...", end="", flush=True)
+    default_path = data_path + "mit1003/"
+    stimuli_path = default_path + "stimuli/"
+    fixations_path = default_path + "fixations/"
+    saliency_path = default_path + "saliency/"
+    os.makedirs(stimuli_path, exist_ok=True)
+    os.makedirs(fixations_path, exist_ok=True)
+    os.makedirs(saliency_path, exist_ok=True)
+    url = "https://people.csail.mit.edu/tjudd/WherePeopleLook/ALLSTIMULI.zip"
+    with open(data_path + "tmp.zip", "wb") as f:
+        f.write(requests.get(url).content)
+    with zipfile.ZipFile(data_path + "tmp.zip", "r") as zip_ref:
+        for file in zip_ref.namelist():
+            if file.endswith(".jpeg"):
+                file_name = os.path.split(file)[1]
+                file_path = stimuli_path + file_name
+                with open(file_path, "wb") as stimulus:
+                    stimulus.write(zip_ref.read(file))
+    url = "https://people.csail.mit.edu/tjudd/WherePeopleLook/ALLFIXATIONMAPS.zip"
+    with open(data_path + "tmp.zip", "wb") as f:
+        f.write(requests.get(url).content)
+    with zipfile.ZipFile(data_path + "tmp.zip", "r") as zip_ref:
+        for file in zip_ref.namelist():
+            file_name = os.path.split(file)[1]
+            if file.endswith("Pts.jpg"):
+                file_path = fixations_path + file_name
+                # this file is mistakenly included in the dataset and can be ignored
+                if file_name == "i05june05_static_street_boston_p1010764fixPts.jpg":
+                    continue
+                with open(file_path, "wb") as fixations:
+                    fixations.write(zip_ref.read(file))
+            elif file.endswith("Map.jpg"):
+                file_path = saliency_path + file_name
+                with open(file_path, "wb") as saliency:
+                    saliency.write(zip_ref.read(file))
+    os.remove(data_path + "tmp.zip")
+    print("done!", flush=True)
+def download_cat2000(data_path):
+    """Downloads the CAT2000 dataset. Three folders are then created that
+       contain the stimuli, binary fixation maps, and blurred saliency
+       distributions respectively.
+    Args:
+        data_path (str): Defines the path where the dataset will be
+                         downloaded and extracted to.
+    """
+    print(">> Downloading CAT2000 dataset...", end="", flush=True)
+    default_path = data_path + "cat2000/"
+    os.makedirs(data_path, exist_ok=True)
+    url = "http://saliency.mit.edu/trainSet.zip"
+    with open(data_path + "tmp.zip", "wb") as f:
+        f.write(requests.get(url).content)
+    with zipfile.ZipFile(data_path + "tmp.zip", "r") as zip_ref:
+        for file in zip_ref.namelist():
+            if not("Output" in file or "allFixData" in file):
+                zip_ref.extract(file, data_path)
+    os.rename(data_path + "trainSet/", default_path)
+    os.rename(default_path + "Stimuli", default_path + "stimuli")
+    os.rename(default_path + "FIXATIONLOCS", default_path + "fixations")
+    os.rename(default_path + "FIXATIONMAPS", default_path + "saliency")
+    os.remove(data_path + "tmp.zip")
+    print("done!", flush=True)
+def download_dutomron(data_path):
+    """Downloads the DUT-OMRON dataset. Three folders are then created that
+       contain the stimuli, binary fixation maps, and blurred saliency
+       distributions respectively.
+    Args:
+        data_path (str): Defines the path where the dataset will be
+                         downloaded and extracted to.
+    """
+    print(">> Downloading DUTOMRON dataset...", end="", flush=True)
+    default_path = data_path + "dutomron/"
+    stimuli_path = default_path + "stimuli/"
+    fixations_path = default_path + "fixations/"
+    saliency_path = default_path + "saliency/"
+    os.makedirs(stimuli_path, exist_ok=True)
+    os.makedirs(fixations_path, exist_ok=True)
+    os.makedirs(saliency_path, exist_ok=True)
+    url = "http://saliencydetection.net/dut-omron/download/DUT-OMRON-image.zip"
+    with open(data_path + "tmp.zip", "wb") as f:
+        f.write(requests.get(url).content)
+    with zipfile.ZipFile(data_path + "tmp.zip", "r") as zip_ref:
+        for file in zip_ref.namelist():
+            if file.endswith(".jpg") and "._" not in file:
+                file_name = os.path.basename(file)
+                file_path = stimuli_path + file_name
+                with open(file_path, "wb") as stimulus:
+                    stimulus.write(zip_ref.read(file))
+    url = "http://saliencydetection.net/dut-omron/download/DUT-OMRON-eye-fixations.zip"
+    with open(data_path + "tmp.zip", "wb") as f:
+        f.write(requests.get(url).content)
+    with zipfile.ZipFile(data_path + "tmp.zip", "r") as zip_ref:
+        for file in zip_ref.namelist():
+            if file.endswith(".mat") and "._" not in file:
+                file_name = os.path.basename(file)
+                file_name = os.path.splitext(file_name)[0] + ".png"
+                loaded_zip = io.BytesIO(zip_ref.read(file))
+                fixations = loadmat(loaded_zip)["s"]
+                sorted_idx = fixations[:, 2].argsort()
+                fixations = fixations[sorted_idx]
+                size = fixations[0, :2]
+                fixations_map = np.zeros((size[1], size[0]))
+                fixations_map[fixations[1:, 1],
+                              fixations[1:, 0]] = 1
+                saliency_map = gaussian_filter(fixations_map, 16)
+                imsave(saliency_path + file_name, saliency_map, cmap="gray")
+                imsave(fixations_path + file_name, fixations_map, cmap="gray")
+    os.remove(data_path + "tmp.zip")
+    print("done!", flush=True)
+def download_pascals(data_path):
+    """Downloads the PASCAL-S dataset. Three folders are then created that
+       contain the stimuli, binary fixation maps, and blurred saliency
+       distributions respectively.
+    Args:
+        data_path (str): Defines the path where the dataset will be
+                         downloaded and extracted to.
+    """
+    print(">> Downloading PASCALS dataset...", end="", flush=True)
+    default_path = data_path + "pascals/"
+    stimuli_path = default_path + "stimuli/"
+    fixations_path = default_path + "fixations/"
+    saliency_path = default_path + "saliency/"
+    os.makedirs(stimuli_path, exist_ok=True)
+    os.makedirs(fixations_path, exist_ok=True)
+    os.makedirs(saliency_path, exist_ok=True)
+    url = "http://cbs.ic.gatech.edu/salobj/download/salObj.zip"
+    with open(data_path + "tmp.zip", "wb") as f:
+        f.write(requests.get(url).content)
+    with zipfile.ZipFile(data_path + "tmp.zip", "r") as zip_ref:
+        for file in zip_ref.namelist():
+            file_name = os.path.basename(file)
+            if file.endswith(".jpg") and "imgs/pascal" in file:
+                file_path = stimuli_path + file_name
+                with open(file_path, "wb") as stimulus:
+                    stimulus.write(zip_ref.read(file))
+            elif file.endswith(".png") and "pascal/humanFix" in file:
+                file_path = saliency_path + file_name
+                with open(file_path, "wb") as saliency:
+                    saliency.write(zip_ref.read(file))
+            elif "pascalFix.mat" in file:
+                loaded_zip = io.BytesIO(zip_ref.read(file))
+                with h5py.File(loaded_zip, "r") as f:
+                    fixations = np.array(f.get("fixCell"))[0]
+                    fixations_list = []
+                    for reference in fixations:
+                        obj = np.array(f[reference])
+                        obj = np.stack((obj[0], obj[1]), axis=-1)
+                        fixations_list.append(obj)
+            elif "pascalSize.mat" in file:
+                loaded_zip = io.BytesIO(zip_ref.read(file))
+                with h5py.File(loaded_zip, "r") as f:
+                    sizes = np.array(f.get("sizeData"))
+                    sizes = np.transpose(sizes, (1, 0))
+    for idx, value in enumerate(fixations_list):
+        size = [int(x) for x in sizes[idx]]
+        fixations_map = np.zeros(size)
+        for fixation in value:
+            fixations_map[int(fixation[0]) - 1,
+                          int(fixation[1]) - 1] = 1
+        file_name = str(idx + 1) + ".png"
+        file_path = fixations_path + file_name
+        imsave(file_path, fixations_map, cmap="gray")
+    os.remove(data_path + "tmp.zip")
+    print("done!", flush=True)
+def download_osie(data_path):
+    """Downloads the OSIE dataset. Three folders are then created that
+       contain the stimuli, binary fixation maps, and blurred saliency
+       distributions respectively.
+    Args:
+        data_path (str): Defines the path where the dataset will be
+                         downloaded and extracted to.
+    """
+    print(">> Downloading OSIE dataset...", end="", flush=True)
+    default_path = data_path + "osie/"
+    stimuli_path = default_path + "stimuli/"
+    fixations_path = default_path + "fixations/"
+    saliency_path = default_path + "saliency/"
+    os.makedirs(stimuli_path, exist_ok=True)
+    os.makedirs(fixations_path, exist_ok=True)
+    os.makedirs(saliency_path, exist_ok=True)
+    url = "https://github.com/NUS-VIP/predicting-human-gaze-beyond-pixels/archive/master.zip"
+    with open(data_path + "tmp.zip", "wb") as f:
+        f.write(requests.get(url).content)
+    with zipfile.ZipFile(data_path + "tmp.zip", "r") as zip_ref:
+        for file in zip_ref.namelist():
+            file_name = os.path.basename(file)
+            if file.endswith(".jpg") and "data/stimuli" in file:
+                file_path = stimuli_path + file_name
+                with open(file_path, "wb") as stimulus:
+                    stimulus.write(zip_ref.read(file))
+            elif file_name == "fixations.mat":
+                loaded_zip = io.BytesIO(zip_ref.read(file))
+                loaded_mat = loadmat(loaded_zip)["fixations"]
+                for idx, value in enumerate(loaded_mat):
+                    subjects = value[0][0][0][1]
+                    fixations_map = np.zeros((600, 800))
+                    for subject in subjects:
+                        x_vals = subject[0][0][0][0][0]
+                        y_vals = subject[0][0][0][1][0]
+                        fixations = np.stack((y_vals, x_vals), axis=-1)
+                        fixations = fixations.astype(int)
+                        fixations_map[fixations[:, 0],
+                                      fixations[:, 1]] = 1
+                    file_name = str(1001 + idx) + ".png"
+                    saliency_map = gaussian_filter(fixations_map, 16)
+                    imsave(saliency_path + file_name, saliency_map, cmap="gray")
+                    imsave(fixations_path + file_name, fixations_map, cmap="gray")
+    os.remove(data_path + "tmp.zip")
+    print("done!", flush=True)
+def download_fiwi(data_path):
+    """Downloads the FIWI dataset. Three folders are then created that
+       contain the stimuli, binary fixation maps, and blurred saliency
+       distributions respectively.
+    Args:
+        data_path (str): Defines the path where the dataset will be
+                         downloaded and extracted to.
+    """
+    print(">> Downloading FIWI dataset...", end="", flush=True)
+    default_path = data_path + "fiwi/"
+    stimuli_path = default_path + "stimuli/"
+    fixations_path = default_path + "fixations/"
+    saliency_path = default_path + "saliency/"
+    os.makedirs(stimuli_path, exist_ok=True)
+    os.makedirs(fixations_path, exist_ok=True)
+    os.makedirs(saliency_path, exist_ok=True)
+    url = "https://www.dropbox.com/s/30nxg2uwd1wpb80/webpage_dataset.zip?dl=1"
+    with open(data_path + "tmp.zip", "wb") as f:
+        f.write(requests.get(url).content)
+    with zipfile.ZipFile(data_path + "tmp.zip", "r") as zip_ref:
+        for file in zip_ref.namelist():
+            file_name = os.path.basename(file)
+            if file.endswith(".png") and "stimuli" in file:
+                file_path = stimuli_path + file_name
+                with open(file_path, "wb") as stimulus:
+                    stimulus.write(zip_ref.read(file))
+            elif file.endswith(".png") and "all5" in file:
+                loaded_zip = io.BytesIO(zip_ref.read(file))
+                fixations = imread(loaded_zip)
+                saliency = gaussian_filter(fixations, 30)
+                imsave(saliency_path + file_name, saliency, cmap="gray")
+                imsave(fixations_path + file_name, fixations, cmap="gray")
+    os.remove(data_path + "tmp.zip")
+    print("done!", flush=True)
+def download_pretrained_weights(data_path, key):
+    """Downloads the pre-trained weights for the VGG16 model when
+       training or the MSI-Net when testing on new data instances.
+    Args:
+        data_path (str): Defines the path where the weights will be
+                         downloaded and extracted to.
+        key (str): Describes the type of model for which the weights will
+                   be downloaded. This contains the device and dataset.
+    .. seealso:: The code for downloading files from google drive is based
+                 on the solution provided at [https://bit.ly/2JSVgMQ].
+    """
+    print(">> Downloading pre-trained weights...", end="", flush=True)
+    os.makedirs(data_path, exist_ok=True)
+    ids = {
+        "vgg16_hybrid": "1ff0va472Xs1bvidCwRlW3Ctf7Hbyyn7p",
+        "model_salicon_cpu": "1Xy9C72pcA8DO4CY0rc6B7wsuE9L9DDZY",
+        "model_salicon_gpu": "1Th7fqVYx25ePMZz4LYsjNQWgAu8tJqwL",
+        "model_mit1003_cpu": "1jsESjYtsTvkMqKftA4rdstfB7mSYw5Ec",
+        "model_mit1003_gpu": "1P_tWxBl3igZlzcHGp5H3T3kzsOskWeG6",
+        "model_cat2000_cpu": "1XxaEx7xxD6rHasQTa-VY7T7eVpGhMxuV",
+        "model_cat2000_gpu": "1T6ChEGB6Mf02gKXrENjdeD6XXJkE_jHh",
+        "model_dutomron_cpu": "14tuRZpKi8LMDKRHNVUylu6RuAaXLjHTa",
+        "model_dutomron_gpu": "15LG_M45fpYC1pTwnwmArNTZw_Z3BOIA-",
+        "model_pascals_cpu": "1af9IvBqFamKWx64Ror6ALivuKNioOVIf",
+        "model_pascals_gpu": "1C-T-RQzX2SaiY9Nw1HmaSx6syyCt01Z0",
+        "model_osie_cpu": "1JD1tvAqZGxj_gEGmIfoxb9dTe5HOaHj1",
+        "model_osie_gpu": "1g8UPr1hGpUdOSWerRb751pZqiWBOZOCh",
+        "model_fiwi_cpu": "19qj9nAjd5gVHLB71oRn_YfYDw5n4Uf2X",
+        "model_fiwi_gpu": "12OpIMIi2IyDVaxkE2d37XO9uUsSYf1Ec"
+    }
+    url = "https://drive.google.com/uc?id=" + ids[key] + "&export=download"
+    gdown.download(url, data_path + "tmp.zip", quiet=True)
+    with zipfile.ZipFile(data_path + "tmp.zip", "r") as zip_ref:
+        for file in zip_ref.namelist():
+            zip_ref.extract(file, data_path)
+    os.remove(data_path + "tmp.zip")
+    print("done!", flush=True)

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+tensorflow==1.13.1
+protobuf==3.19.0
+matplotlib==3.0.3
+requests==2.21.0
+scipy==1.4.1
+streamlit==0.89.0

run_gradio.py ADDED Viewed

	@@ -0,0 +1,126 @@

+from PIL import Image, ImageDraw
+import gradio as gr
+import numpy as np
+import tensorflow as tf
+import download, os, sys
+def best_window(saliency, aspect_ratio=(16, 9)):
+    """ returns left, right, bottom, top
+  saliency is np.array with shape (height, width)
+  aspect_ratio is tuple of (width, height)
+  """
+    orig_height, orig_width = saliency.shape
+    move_vertically = orig_height >= orig_width / aspect_ratio[0] * \
+                      aspect_ratio[1]
+    if move_vertically:
+        saliency_per_row = np.sum(saliency, axis=1)
+        height = round(orig_width / aspect_ratio[0] * aspect_ratio[1])
+        convolved_saliency = np.convolve(saliency_per_row, np.ones(height),
+                                         "valid")
+        max_row = np.argmax(convolved_saliency)
+        return 0, orig_width, max_row, max_row + height
+    else:
+        saliency_per_col = np.sum(saliency, axis=0)
+        width = round(orig_height / aspect_ratio[1] * aspect_ratio[0])
+        convolved_saliency = np.convolve(saliency_per_col, np.ones(width),
+                                         "valid")
+        max_col = np.argmax(convolved_saliency)
+        return max_col, max_col + width, 0, orig_height
+def overlay_saliency(img, map, bbox = {}):
+    background = img.convert("RGBA")
+    overlay = map.convert("RGBA")
+    overlaid = Image.blend(background, overlay, 0.75)
+    draw = ImageDraw.Draw(overlaid)
+    if bbox:
+        draw.rectangle(
+            [bbox['left'], bbox['bottom'], bbox['right'], bbox['top']],
+            outline="orange", width=5)
+    return overlaid
+def get_saliency_sum_box(crop_data, bounded, saliency):
+    left, right, bottom, top = int(crop_data["x"]), int(
+        crop_data["x"] + crop_data["width"]), int(crop_data["y"]), int(
+        crop_data["y"] + crop_data["height"])
+    sal_sum = np.sum(saliency[bottom:top, left:right])
+    total = np.sum(saliency)
+    pct_sal = round(100 * sal_sum / total, 2)
+    draw = ImageDraw.Draw(bounded)
+    draw.rectangle([left, bottom, right, top], outline="red", width=5)
+    return bounded, pct_sal
+def test_model(im_arr, model_dict, aspect_ratio_tup = None):
+    # original_arr, crop_data = original_arr
+    # crop_data["original_height"] = original_arr.shape[0]
+    # crop_data["original_width"] = original_arr.shape[1]
+    original_img = Image.fromarray(im_arr).convert('RGB')
+    w, h = original_img.size
+    h_ = int(400 / w * h)
+    resized_img = original_img.resize((400, h_))
+    resized_arr = np.asarray(resized_img)
+    resized_arr = resized_arr[np.newaxis, ...]
+    saliency_arr = model_dict['sess'].run(model_dict['predicted_maps'],
+                                feed_dict={
+                                    model_dict['input_plhd']: resized_arr
+                                })
+    saliency_arr = saliency_arr.squeeze()
+    saliency_img = Image.fromarray(np.uint8(saliency_arr * 255), 'L')
+    saliency_resized_img = saliency_img.resize((w, h))
+    saliency_resized_arr = np.asarray(saliency_resized_img)
+    saliency_zero_one = np.divide(saliency_resized_arr, 255.0)
+    bbox = None
+    if aspect_ratio_tup:
+        left, right, bottom, top = best_window(saliency_resized_arr,
+                                    aspect_ratio=aspect_ratio_tup)
+        bbox = {'left': left, 'right': right, 'bottom': bottom, 'top':top}
+    # output = original_arr[bottom:top, left:right, :]
+    bounded = overlay_saliency(original_img, saliency_resized_img, bbox=bbox)
+    return bounded
+    # with_sal_box, pct_sal = get_saliency_sum_box(crop_data, bounded,
+    #                                              saliency_zero_one)
+    # sal_sum = str(pct_sal) + "%"
+    # return with_sal_box, sal_sum
+def load_model(model_name = "weights/model_mit1003_cpu.pb"):
+    ### Model loading code
+    graph_def = tf.GraphDef()
+    if not os.path.isfile(model_name):
+        download.download_pretrained_weights('weights/', 'model_mit1003_cpu')
+    with tf.gfile.Open(model_name, "rb") as file:
+        graph_def.ParseFromString(file.read())
+        input_plhd = tf.placeholder(tf.float32, (None, None, None, 3))
+        [predicted_maps] = tf.import_graph_def(graph_def,
+                                               input_map={"input": input_plhd},
+                                               return_elements=["output:0"])
+    sess = tf.Session()
+    return {
+        'sess': sess,
+        'predicted_maps': predicted_maps,
+        'input_plhd': input_plhd
+    }
+if __name__ == '__main__':
+    examples = [["images/1.jpg", True],
+                ["images/2.jpg", True]]
+    thumbnail = "https://ibb.co/hXdbDyD"
+    io = gr.Interface(test_model,
+                      gr.inputs.Image(label="Your Image", tool='select'),
+                      [gr.outputs.Image(label="Cropped Image"),
+                       gr.outputs.Label(label="Percent of Saliency in Red Box")],
+                      allow_flagging=False,
+                      thumbnail=thumbnail,
+                      examples=examples, analytics_enabled=False)
+    io.launch(debug=True)