Plsek commited on
Commit
6c6f7cf
1 Parent(s): e75d9c7

Upload 7 files

Browse files
.gitattributes CHANGED
@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,117 @@
1
  ---
2
  license: wtfpl
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: wtfpl
3
+ language:
4
+ - en
5
+ library_name: keras
6
+ pipeline_tag: image-segmentation
7
  ---
8
+ # *Cavity Detection Tool* (CADET)
9
+
10
+ [***CADET***](https://tomasplsek.github.io/CADET/) is a machine learning pipeline trained for identification of surface brightness depressions (so-called *X-ray cavities*) on noisy *Chandra* images of early-type galaxies and galaxy clusters. The pipeline consists of a convolutional neural network trained for producing pixel-wise cavity predictions and a DBSCAN clustering algorithm, which decomposes the predictions into individual cavities. The pipeline is further described in [Plšek et al. 2023](https://arxiv.org/abs/2304.05457).
11
+
12
+ <!-- The pipeline was developed in order to improve the automation and accuracy of X-ray cavity detection and size-estimation. -->
13
+ The architecture of the convolutional network consists of 5 convolutional blocks, each resembling an Inception layer, it was implemented using *Keras* library and it's development was inspired by [Fort et al. 2017](https://ui.adsabs.harvard.edu/abs/2017arXiv171200523F/abstract) and [Secká 2019](https://is.muni.cz/th/rnxoz/?lang=en;fakulta=1411). For the clustering, we utilized is the *Scikit-learn* implementation of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN, [Ester et al. 1996](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.9220)).
14
+
15
+ ![Architecture](figures/architecture.png)
16
+
17
+ ## Requirements
18
+
19
+ For simple usage of the ***CADET*** pipeline, following libraries are required:\
20
+ `matplotlib`\
21
+ `astropy`\
22
+ `numpy`\
23
+ `scipy`\
24
+ `sklearn`\
25
+ `keras`\
26
+ `tensorflow`
27
+
28
+ If you want to re-train the network from scratch or generate training images, an additional library is required:\
29
+ [`jax`](https://github.com/google/jax)
30
+
31
+ ## Usage
32
+
33
+ The ***CADET*** pipeline inputs either raw *Chandra* images in units of counts (numbers of captured photons) or exposure-corrected images. When using exposure-corrected images, images should be normalized by the lowest pixel value so all pixels are higher than or equal to 1. For images with many point sources, they should be filled with surrounding background level using Poisson statistics ([dmfilth](https://cxc.cfa.harvard.edu/ciao/ahelp/dmfilth.html) within [CIAO](https://cxc.harvard.edu/ciao/)).
34
+
35
+ Convolutional part of the ***CADET*** pipeline can only input 128x128 images. As a part of the pipeline, input images are therefore being cropped to a size specified by parameter scale (size = scale * 128 pixels) and re-binned to 128x128 images. By default, images are probed on 4 different scales (1,2,3,4). The size of the image inputted into the pipeline therefore needs to at least 512x512 pixels (minimal input size differs if non-default scales are used) and images should be centred at the centre of the galaxy. The re-binning is performed using *Astropy* and *Numpy* libraries and can only handle integer binsizes. For floating point number binning, we recommend using [dmregrid](https://cxc.cfa.harvard.edu/ciao/ahelp/dmregrid.html) and applying ***CADET*** model manually (see Convolutional part).
36
+
37
+ Before being decomposed by the DBSCAN algorithm, pixel-wise predictions produced by the convolutional part of the ***CADET*** pipeline need to be further thresholded. In order to simultaneously calibrate the volume error and false positive rate, we introduced two discrimination thresholds (for more info see [Plšek et al. 2023]()) and their default values are 0.4 and 0.6, respectively. Nevertheless, both discrimination thresholds are changeable and can be set to an arbitrary value between 0 and 1.
38
+
39
+ The ***CADET*** pipeline is composed as a self-standing Python script (`CADET.py`), which can be run by simply calling it from a terminal using following arguments:\
40
+ `filename` - string, name of the fits file\
41
+ `scales` - list, list of size scales used to crop input images, optional (default: [1,2,3,4])\
42
+ `threshold1` - float, between 0 and 1, calibrates volume error, optional (default: 0.4)\
43
+ `threshold2` - float, between 0 and 1, calibrates false positive rate, optional (default: 0.6)
44
+
45
+ ```console
46
+ $ python3 CADET.py filename [scales] [threshold1] [threshold2]
47
+ ```
48
+
49
+ Example:
50
+
51
+ ```console
52
+ $ python3 CADET.py NGC5813.fits
53
+ $ python3 CADET.py NGC5813.fits [1,2,3,4]
54
+ $ python3 CADET.py NGC5813.fits [1,2,3,4] 0.5 0.9
55
+ ```
56
+
57
+ The `CADET.py` script loads a FITS file specified by the `filename` argument, which is located in the same folder as the main `CADET.py` script. The script creates a folder of the same name as the FITS file, and saves corresponding pixel-wise as well as decomposed cavity predictions into the FITS format while also properly preserving the WCS coordinates. On the output, there is also a PNG file showing decomposed predictions for individual scales.
58
+
59
+ The volumes of X-ray cavities are calculated under the assumption of rotational symmetry along the direction from the galactic centre towards the centre of the cavity (estimated as *center of mass*). The cavity depth in each point along that direction is then assumed to be equal to its width. Thereby produced 3D cavity models are stored in the `.npy` format and can be used for further caclulation (e.g. cavity energy estimationš)
60
+
61
+ ![](figures/NGC5813.png)
62
+
63
+ ### Convolutional part
64
+
65
+ <!-- [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tomasplsek/CADET/blob/main/CADET_example_colab.ipynb) -->
66
+
67
+ The convolutional part of the pipeline can be used separately to produce raw pixel-wise predictions. Since the convolutional network was implemented using the functional *Keras* API, the architecture together with trained weights could have been stored in the HDF5 format (*CADET.hdf5*). Trained model can be therefore simply loaded using the `load_model` *Keras* function:
68
+
69
+ ```python
70
+ from keras.models import load_model
71
+
72
+ model = load_model("CADET.hdf5")
73
+
74
+ y_pred = model.predict(X)
75
+ ```
76
+
77
+ The CNN network inputs 128x128 images, however, to maintain the compatibility with *Keras*, the input needs to be reshaped as `X.reshape(1, 128, 128, 1)` for single image or as `X.reshape(-1, 128, 128, 1)` for multiple images.
78
+
79
+ Thus produced pixel-wise prediction needs to be further thresholded and decomposed into individual cavities using a DBSCAN clustering algorithm:
80
+
81
+ ```python
82
+ import numpy as np
83
+ from sklearn.cluster import DBSCAN
84
+
85
+ y_pred = np.where(y_pred > threshold, 1, 0)
86
+
87
+ x, y = y_pred.nonzero()
88
+ data = np.array([x,y]).reshape(2, -1)
89
+
90
+ clusters = DBSCAN(eps=1.5, min_samples=3).fit(data.T).labels_
91
+ ```
92
+
93
+ ## How to cite
94
+
95
+ The ***CADET*** pipeline was originally developed as a part of my [diploma thesis](https://is.muni.cz/th/x68od/?lang=en) and was further described in [Plšek et al. 2023](https://arxiv.org/abs/2304.05457). If you use the ***CADET*** pipeline in your research, please cite the following paper:
96
+
97
+ ```
98
+ @misc{plšek2023cavity,
99
+ title={CAvity DEtection Tool (CADET): Pipeline for automatic detection of X-ray cavities in hot galactic and cluster atmospheres},
100
+ author={Tomáš Plšek and Norbert Werner and Martin Topinka and Aurora Simionescu},
101
+ year={2023},
102
+ eprint={2304.05457},
103
+ archivePrefix={arXiv},
104
+ primaryClass={astro-ph.HE}
105
+ }
106
+ ```
107
+
108
+ ## Todo
109
+
110
+ The following improvements for the data generation and training process are currently planned:
111
+
112
+ - [ ] add other features (cold fronts, complex sloshing, point sources, jets)
113
+ - [ ] use more complex cavity shapes (e.g. [Guo et al. 2015](https://arxiv.org/abs/1408.5018))
114
+ - [ ] train on multiband images simulated using PyXsim/SOXS
115
+ - [ ] replace DBSCAN by using instance segmentation
116
+ - [ ] restrict the cavity number and shape using regularization?
117
+ - [ ] systematic cavity size uncertainty estimation using MC Dropout
figures/NGC5813.png ADDED
figures/architecture.png ADDED
keras_metadata.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:602c67f4b3495cec0a806123dd9f8faf0e79d828f89b7f6d418a69d4a9684dba
3
+ size 226308
saved_model.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76dd852aaf823537d3413f963c2b267be54747584da27e0e812337eb43aa882c
3
+ size 2280504
variables/variables.data-00000-of-00001 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a92c41b562ecc725156380debbc13bcbac466d5c609f0016269ca753b0c9e7eb
3
+ size 6887571
variables/variables.index ADDED
Binary file (34.8 kB). View file