Spaces:
Build error
Build error
ohjho
commited on
Commit
•
38761f7
1
Parent(s):
6b74814
added missing gdown to requirements
Browse files- README.md +2 -98
- requirements.txt +1 -0
README.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
---
|
2 |
title: Vox_saliency_demo
|
3 |
emoji: 🐢
|
4 |
-
colorFrom:
|
5 |
colorTo: gray
|
6 |
python_version: 3.7.8
|
7 |
sdk: streamlit
|
@@ -9,100 +9,4 @@ sdk_version: 0.89.0
|
|
9 |
app_file: app.py
|
10 |
pinned: false
|
11 |
---
|
12 |
-
#
|
13 |
-
|
14 |
-
This repo was forked by the [Miro team](https://miro.io/#) to create the interface [here]()
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
# Contextual Encoder-Decoder Network <br/> for Visual Saliency Prediction
|
21 |
-
|
22 |
-
![](https://img.shields.io/badge/python-v3.6.8-orange.svg?style=flat-square)
|
23 |
-
![](https://img.shields.io/badge/tensorflow-v1.13.1-orange.svg?style=flat-square)
|
24 |
-
![](https://img.shields.io/badge/matplotlib-v3.0.3-orange.svg?style=flat-square)
|
25 |
-
![](https://img.shields.io/badge/requests-v2.21.0-orange.svg?style=flat-square)
|
26 |
-
|
27 |
-
<img src="./figures/results.jpg" width="800"/>
|
28 |
-
|
29 |
-
This repository contains the official *TensorFlow* implementation of the MSI-Net (multi-scale information network), as described in the Neural Networks paper [Contextual encoder-decoder network for visual saliency prediction](https://www.sciencedirect.com/science/article/pii/S0893608020301660) (2020) and on [arXiv](https://arxiv.org/abs/1902.06634).
|
30 |
-
|
31 |
-
**_Abstract:_** *Predicting salient regions in natural images requires the detection of objects that are present in a scene. To develop robust representations for this challenging task, high-level visual features at multiple spatial scales must be extracted and augmented with contextual information. However, existing models aimed at explaining human fixation maps do not incorporate such a mechanism explicitly. Here we propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task. The architecture forms an encoder-decoder structure and includes a module with multiple convolutional layers at different dilation rates to capture multi-scale features in parallel. Moreover, we combine the resulting representations with global scene information for accurately predicting visual saliency. Our model achieves competitive and consistent results across multiple evaluation metrics on two public saliency benchmarks and we demonstrate the effectiveness of the suggested approach on five datasets and selected examples. Compared to state of the art approaches, the network is based on a lightweight image classification backbone and hence presents a suitable choice for applications with limited computational resources, such as (virtual) robotic systems, to estimate human fixations across complex natural scenes.*
|
32 |
-
|
33 |
-
Our results are available on the original [MIT saliency benchmark](http://saliency.mit.edu/results.html) and the updated [MIT/Tübingen saliency benchmark](https://saliency.tuebingen.ai/results.html). The latter are derived from a probabilistic version of our model with metric-specific postprocessing for a fair model comparison.
|
34 |
-
|
35 |
-
## Reference
|
36 |
-
|
37 |
-
If you use this code in your research, please cite the following paper:
|
38 |
-
|
39 |
-
```
|
40 |
-
@article{kroner2020contextual,
|
41 |
-
title={Contextual encoder-decoder network for visual saliency prediction},
|
42 |
-
author={Kroner, Alexander and Senden, Mario and Driessens, Kurt and Goebel, Rainer},
|
43 |
-
url={http://www.sciencedirect.com/science/article/pii/S0893608020301660},
|
44 |
-
doi={https://doi.org/10.1016/j.neunet.2020.05.004},
|
45 |
-
journal={Neural Networks},
|
46 |
-
publisher={Elsevier},
|
47 |
-
year={2020},
|
48 |
-
volume={129},
|
49 |
-
pages={261--270},
|
50 |
-
issn={0893-6080}
|
51 |
-
}
|
52 |
-
```
|
53 |
-
|
54 |
-
## Architecture
|
55 |
-
|
56 |
-
<img src="./figures/architecture.jpg" width="700"/>
|
57 |
-
|
58 |
-
## Requirements
|
59 |
-
|
60 |
-
| Package | Version |
|
61 |
-
|:----------:|:-------:|
|
62 |
-
| python | 3.6.8 |
|
63 |
-
| tensorflow | 1.13.1 |
|
64 |
-
| matplotlib | 3.0.3 |
|
65 |
-
| requests | 2.21.0 |
|
66 |
-
| scipy | 1.4.1 |
|
67 |
-
|
68 |
-
The code was tested and is compatible with both Windows and Linux. We strongly recommend to use *TensorFlow* with GPU acceleration, especially when training the model. Nevertheless, a slower CPU version is officially supported.
|
69 |
-
|
70 |
-
## Training
|
71 |
-
|
72 |
-
The results of our paper can be reproduced by first training the MSI-Net via the following command:
|
73 |
-
|
74 |
-
```
|
75 |
-
python main.py train
|
76 |
-
```
|
77 |
-
|
78 |
-
This will start the training procedure for the SALICON dataset with the hyperparameters defined in `config.py`. If you want to optimize the model for CPU usage, please change the corresponding `device` value in the configurations file. Optionally, the dataset and download path can be specified via command line arguments:
|
79 |
-
|
80 |
-
```
|
81 |
-
python main.py train -d DATA -p PATH
|
82 |
-
```
|
83 |
-
|
84 |
-
Here, the `DATA` argument must be `salicon`, `mit1003`, `cat2000`, `dutomron`, `pascals`, `osie`, or `fiwi`. It is required that the model is first trained on the SALICON dataset before fine-tuning it on any of the other ones. By default, the selected saliency dataset will be downloaded to the folder `data/` but you can point to a different directory via the `PATH` argument.
|
85 |
-
|
86 |
-
All results are then stored under the folder `results/`, which contains the training history and model checkpoints. This allows to continue training or perform inference on test instances, as described in the next section.
|
87 |
-
|
88 |
-
## Testing
|
89 |
-
|
90 |
-
To test a pre-trained model on image data and produce saliency maps, execute the following command:
|
91 |
-
|
92 |
-
```
|
93 |
-
python main.py test -d DATA -p PATH
|
94 |
-
```
|
95 |
-
|
96 |
-
If no checkpoint is available from prior training, it will automatically download our pre-trained model to `weights/`. The `DATA` argument defines which network is used and must be `salicon`, `mit1003`, `cat2000`, `dutomron`, `pascals`, `osie`, or `fiwi`. It will then resize the input images to the dimensions specified in the configurations file. Note that this might lead to excessive image padding depending on the selected dataset.
|
97 |
-
|
98 |
-
The `PATH` argument points to the folder where the test data is stored but can also denote a single image file directly. As for network training, the `device` value can be changed to CPU in the configurations file. This ensures that the model optimized for CPU will be utilized and hence improves the inference speed. All results are finally stored in the folder `results/images/` with the original image dimensions.
|
99 |
-
|
100 |
-
## Demo
|
101 |
-
|
102 |
-
<img src="./demo/demo.gif" width="750"/>
|
103 |
-
|
104 |
-
A demonstration of saliency prediction in the browser is available [here](https://storage.googleapis.com/msi-net/demo/index.html). It computes saliency maps based on the input from a webcam via *TensorFlow.js*. Since the library uses the machine's hardware, model performance is dependent on your local configuration. The buttons allow you to select the quality, ranging from *very low* for a version trained on low image resolution with high inference speed, to *very high* for a version trained on high image resolution with slow inference speed.
|
105 |
-
|
106 |
-
## Contact
|
107 |
-
|
108 |
-
For questions, bug reports, and suggestions about this work, please create an [issue](https://github.com/alexanderkroner/saliency/issues) in this repository.
|
|
|
1 |
---
|
2 |
title: Vox_saliency_demo
|
3 |
emoji: 🐢
|
4 |
+
colorFrom: purple
|
5 |
colorTo: gray
|
6 |
python_version: 3.7.8
|
7 |
sdk: streamlit
|
|
|
9 |
app_file: app.py
|
10 |
pinned: false
|
11 |
---
|
12 |
+
# HuggingFace Space branch: `main`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements.txt
CHANGED
@@ -4,3 +4,4 @@ matplotlib==3.0.3
|
|
4 |
requests==2.21.0
|
5 |
scipy==1.4.1
|
6 |
streamlit==0.89.0
|
|
|
|
4 |
requests==2.21.0
|
5 |
scipy==1.4.1
|
6 |
streamlit==0.89.0
|
7 |
+
gdown==4.4.0
|