Realcat commited on
Commit
0bf7151
·
1 Parent(s): 816b9f6

add: thirdparty

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. third_party/ALIKE/LICENSE +29 -0
  2. third_party/ALIKE/README.md +131 -0
  3. third_party/ALIKE/alike.py +198 -0
  4. third_party/ALIKE/alnet.py +194 -0
  5. third_party/ALIKE/demo.py +201 -0
  6. third_party/ALIKE/hseq/cache/alike-l-ms.npy +3 -0
  7. third_party/ALIKE/hseq/cache/alike-l.npy +3 -0
  8. third_party/ALIKE/hseq/cache/alike-n-ms.npy +3 -0
  9. third_party/ALIKE/hseq/cache/alike-n.npy +3 -0
  10. third_party/ALIKE/hseq/cache/aslfeat.npy +3 -0
  11. third_party/ALIKE/hseq/cache/d2.npy +3 -0
  12. third_party/ALIKE/hseq/cache/disk.npy +3 -0
  13. third_party/ALIKE/hseq/cache/lfnet.npy +3 -0
  14. third_party/ALIKE/hseq/cache/r2d2.npy +3 -0
  15. third_party/ALIKE/hseq/cache/superpoint.npy +3 -0
  16. third_party/ALIKE/hseq/eval.py +197 -0
  17. third_party/ALIKE/hseq/extract.py +175 -0
  18. third_party/ALIKE/matlab/createfigure.m +75 -0
  19. third_party/ALIKE/matlab/peakloss_rect.m +19 -0
  20. third_party/ALIKE/requirements.txt +6 -0
  21. third_party/ALIKE/soft_detect.py +234 -0
  22. third_party/ASpanFormer/.github/workflows/sync.yml +39 -0
  23. third_party/ASpanFormer/.gitignore +32 -0
  24. third_party/ASpanFormer/CODE_OF_CONDUCT.md +71 -0
  25. third_party/ASpanFormer/CONTRIBUTING.md +7 -0
  26. third_party/ASpanFormer/LICENSE +9 -0
  27. third_party/ASpanFormer/README.md +98 -0
  28. third_party/ASpanFormer/configs/aspan/indoor/aspan_test.py +11 -0
  29. third_party/ASpanFormer/configs/aspan/indoor/aspan_train.py +12 -0
  30. third_party/ASpanFormer/configs/aspan/outdoor/aspan_test.py +22 -0
  31. third_party/ASpanFormer/configs/aspan/outdoor/aspan_train.py +21 -0
  32. third_party/ASpanFormer/configs/data/__init__.py +0 -0
  33. third_party/ASpanFormer/configs/data/base.py +36 -0
  34. third_party/ASpanFormer/configs/data/debug/.gitignore +3 -0
  35. third_party/ASpanFormer/configs/data/megadepth_test_1500.py +13 -0
  36. third_party/ASpanFormer/configs/data/megadepth_trainval_832.py +26 -0
  37. third_party/ASpanFormer/configs/data/scannet_test_1500.py +11 -0
  38. third_party/ASpanFormer/configs/data/scannet_trainval.py +21 -0
  39. third_party/ASpanFormer/data/megadepth/index/.gitignore +4 -0
  40. third_party/ASpanFormer/data/megadepth/test/.gitignore +4 -0
  41. third_party/ASpanFormer/data/megadepth/train/.gitignore +4 -0
  42. third_party/ASpanFormer/data/scannet/index/.gitignore +4 -0
  43. third_party/ASpanFormer/data/scannet/test/.gitignore +3 -0
  44. third_party/ASpanFormer/data/scannet/train/.gitignore +4 -0
  45. third_party/ASpanFormer/demo/demo.py +91 -0
  46. third_party/ASpanFormer/demo/demo_utils.py +88 -0
  47. third_party/ASpanFormer/docs/TRAINING.md +72 -0
  48. third_party/ASpanFormer/environment.yaml +12 -0
  49. third_party/ASpanFormer/requirements.txt +18 -0
  50. third_party/ASpanFormer/scripts/reproduce_test/indoor.sh +31 -0
third_party/ALIKE/LICENSE ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ BSD 3-Clause License
2
+
3
+ Copyright (c) 2022, Zhao Xiaoming
4
+ All rights reserved.
5
+
6
+ Redistribution and use in source and binary forms, with or without
7
+ modification, are permitted provided that the following conditions are met:
8
+
9
+ 1. Redistributions of source code must retain the above copyright notice, this
10
+ list of conditions and the following disclaimer.
11
+
12
+ 2. Redistributions in binary form must reproduce the above copyright notice,
13
+ this list of conditions and the following disclaimer in the documentation
14
+ and/or other materials provided with the distribution.
15
+
16
+ 3. Neither the name of the copyright holder nor the names of its
17
+ contributors may be used to endorse or promote products derived from
18
+ this software without specific prior written permission.
19
+
20
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
third_party/ALIKE/README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # News
2
+
3
+ - The [ALIKED](https://github.com/Shiaoming/ALIKED) is released.
4
+ - The [ALIKE training code](https://github.com/Shiaoming/ALIKE/raw/main/assets/ALIKE_code.zip) is released.
5
+
6
+ # ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction
7
+
8
+ ALIKE applies a differentiable keypoint detection module to detect accurate sub-pixel keypoints. The network can run at 95 frames per second for 640 x 480 images on NVIDIA Titan X (Pascal) GPU and achieve equivalent performance with the state-of-the-arts. ALIKE benefits real-time applications in resource-limited platforms/devices. Technical details are described in [this paper](https://arxiv.org/pdf/2112.02906.pdf).
9
+
10
+ > ```
11
+ > Xiaoming Zhao, Xingming Wu, Jinyu Miao, Weihai Chen, Peter C. Y. Chen, Zhengguo Li, "ALIKE: Accurate and Lightweight Keypoint
12
+ > Detection and Descriptor Extraction," IEEE Transactions on Multimedia, 2022.
13
+ > ```
14
+
15
+ ![](./assets/alike.png)
16
+
17
+
18
+ If you use ALIKE in an academic work, please cite:
19
+
20
+ ```
21
+ @article{Zhao2023ALIKED,
22
+ title = {ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation},
23
+ url = {https://arxiv.org/pdf/2304.03608.pdf},
24
+ doi = {10.1109/TIM.2023.3271000},
25
+ journal = {IEEE Transactions on Instrumentation & Measurement},
26
+ author = {Zhao, Xiaoming and Wu, Xingming and Chen, Weihai and Chen, Peter C. Y. and Xu, Qingsong and Li, Zhengguo},
27
+ year = {2023},
28
+ volume = {72},
29
+ pages = {1-16},
30
+ }
31
+
32
+ @article{Zhao2022ALIKE,
33
+ title = {ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction},
34
+ url = {http://arxiv.org/abs/2112.02906},
35
+ doi = {10.1109/TMM.2022.3155927},
36
+ journal = {IEEE Transactions on Multimedia},
37
+ author = {Zhao, Xiaoming and Wu, Xingming and Miao, Jinyu and Chen, Weihai and Chen, Peter C. Y. and Li, Zhengguo},
38
+ month = march,
39
+ year = {2022},
40
+ }
41
+ ```
42
+
43
+
44
+
45
+ ## 1. Prerequisites
46
+
47
+ The required packages are listed in the `requirements.txt` :
48
+
49
+ ```shell
50
+ pip install -r requirements.txt
51
+ ```
52
+
53
+
54
+
55
+ ## 2. Models
56
+
57
+ The off-the-shelf weights of four variant ALIKE models are provided in `models/` .
58
+
59
+
60
+
61
+ ## 3. Run demo
62
+
63
+ ```shell
64
+ $ python demo.py -h
65
+ usage: demo.py [-h] [--model {alike-t,alike-s,alike-n,alike-l}]
66
+ [--device DEVICE] [--top_k TOP_K] [--scores_th SCORES_TH]
67
+ [--n_limit N_LIMIT] [--no_display] [--no_sub_pixel]
68
+ input
69
+
70
+ ALike Demo.
71
+
72
+ positional arguments:
73
+ input Image directory or movie file or "camera0" (for
74
+ webcam0).
75
+
76
+ optional arguments:
77
+ -h, --help show this help message and exit
78
+ --model {alike-t,alike-s,alike-n,alike-l}
79
+ The model configuration
80
+ --device DEVICE Running device (default: cuda).
81
+ --top_k TOP_K Detect top K keypoints. -1 for threshold based mode,
82
+ >0 for top K mode. (default: -1)
83
+ --scores_th SCORES_TH
84
+ Detector score threshold (default: 0.2).
85
+ --n_limit N_LIMIT Maximum number of keypoints to be detected (default:
86
+ 5000).
87
+ --no_display Do not display images to screen. Useful if running
88
+ remotely (default: False).
89
+ --no_sub_pixel Do not detect sub-pixel keypoints (default: False).
90
+ ```
91
+
92
+
93
+
94
+ ## 4. Examples
95
+
96
+ ### KITTI example
97
+ ```shell
98
+ python demo.py assets/kitti
99
+ ```
100
+ ![](./assets/kitti.gif)
101
+
102
+ ### TUM example
103
+ ```shell
104
+ python demo.py assets/tum
105
+ ```
106
+ ![](./assets/tum.gif)
107
+
108
+ ## 5. Efficiency and performance
109
+
110
+ | Models | Parameters | GFLOPs(640x480) | MHA@3 on Hpatches | mAA(10°) on [IMW2020-test](https://www.cs.ubc.ca/research/image-matching-challenge/2021/leaderboard) (Stereo) |
111
+ |:---:|:---:|:---:|:-----------------:|:-------------------------------------------------------------------------------------------------------------:|
112
+ | D2-Net(MS) | 7653KB | 889.40 | 38.33% | 12.27% |
113
+ | LF-Net(MS) | 2642KB | 24.37 | 57.78% | 23.44% |
114
+ | SuperPoint | 1301KB | 26.11 | 70.19% | 28.97% |
115
+ | R2D2(MS) | 484KB | 464.55 | 71.48% | 39.02% |
116
+ | ASLFeat(MS) | 823KB | 77.58 | 73.52% | 33.65% |
117
+ | DISK | 1092KB | 98.97 | 70.56% | 51.22% |
118
+ | ALike-N | 318KB | 7.909 | 75.74% | 47.18% |
119
+ | ALike-L | 653KB | 19.685 | 76.85% | 49.58% |
120
+
121
+ ### Evaluation on Hpatches
122
+
123
+ - Download [hpatches-sequences-release](https://hpatches.github.io/) and put it into `hseq/hpatches-sequences-release`.
124
+ - Remove the unreliable sequences as D2-Net.
125
+ - Run the following command to evaluate the performance:
126
+ ```shell
127
+ python hseq/eval.py
128
+ ```
129
+
130
+
131
+ For more details, please refer to the [paper](https://arxiv.org/abs/2112.02906).
third_party/ALIKE/alike.py ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import os
3
+ import cv2
4
+ import torch
5
+ from copy import deepcopy
6
+ import torch.nn.functional as F
7
+ from torchvision.transforms import ToTensor
8
+ import math
9
+
10
+ from alnet import ALNet
11
+ from soft_detect import DKD
12
+ import time
13
+
14
+ configs = {
15
+ "alike-t": {
16
+ "c1": 8,
17
+ "c2": 16,
18
+ "c3": 32,
19
+ "c4": 64,
20
+ "dim": 64,
21
+ "single_head": True,
22
+ "radius": 2,
23
+ "model_path": os.path.join(os.path.split(__file__)[0], "models", "alike-t.pth"),
24
+ },
25
+ "alike-s": {
26
+ "c1": 8,
27
+ "c2": 16,
28
+ "c3": 48,
29
+ "c4": 96,
30
+ "dim": 96,
31
+ "single_head": True,
32
+ "radius": 2,
33
+ "model_path": os.path.join(os.path.split(__file__)[0], "models", "alike-s.pth"),
34
+ },
35
+ "alike-n": {
36
+ "c1": 16,
37
+ "c2": 32,
38
+ "c3": 64,
39
+ "c4": 128,
40
+ "dim": 128,
41
+ "single_head": True,
42
+ "radius": 2,
43
+ "model_path": os.path.join(os.path.split(__file__)[0], "models", "alike-n.pth"),
44
+ },
45
+ "alike-l": {
46
+ "c1": 32,
47
+ "c2": 64,
48
+ "c3": 128,
49
+ "c4": 128,
50
+ "dim": 128,
51
+ "single_head": False,
52
+ "radius": 2,
53
+ "model_path": os.path.join(os.path.split(__file__)[0], "models", "alike-l.pth"),
54
+ },
55
+ }
56
+
57
+
58
+ class ALike(ALNet):
59
+ def __init__(
60
+ self,
61
+ # ================================== feature encoder
62
+ c1: int = 32,
63
+ c2: int = 64,
64
+ c3: int = 128,
65
+ c4: int = 128,
66
+ dim: int = 128,
67
+ single_head: bool = False,
68
+ # ================================== detect parameters
69
+ radius: int = 2,
70
+ top_k: int = 500,
71
+ scores_th: float = 0.5,
72
+ n_limit: int = 5000,
73
+ device: str = "cpu",
74
+ model_path: str = "",
75
+ ):
76
+ super().__init__(c1, c2, c3, c4, dim, single_head)
77
+ self.radius = radius
78
+ self.top_k = top_k
79
+ self.n_limit = n_limit
80
+ self.scores_th = scores_th
81
+ self.dkd = DKD(
82
+ radius=self.radius,
83
+ top_k=self.top_k,
84
+ scores_th=self.scores_th,
85
+ n_limit=self.n_limit,
86
+ )
87
+ self.device = device
88
+
89
+ if model_path != "":
90
+ state_dict = torch.load(model_path, self.device)
91
+ self.load_state_dict(state_dict)
92
+ self.to(self.device)
93
+ self.eval()
94
+ logging.info(f"Loaded model parameters from {model_path}")
95
+ logging.info(
96
+ f"Number of model parameters: {sum(p.numel() for p in self.parameters() if p.requires_grad) / 1e3}KB"
97
+ )
98
+
99
+ def extract_dense_map(self, image, ret_dict=False):
100
+ # ====================================================
101
+ # check image size, should be integer multiples of 2^5
102
+ # if it is not a integer multiples of 2^5, padding zeros
103
+ device = image.device
104
+ b, c, h, w = image.shape
105
+ h_ = math.ceil(h / 32) * 32 if h % 32 != 0 else h
106
+ w_ = math.ceil(w / 32) * 32 if w % 32 != 0 else w
107
+ if h_ != h:
108
+ h_padding = torch.zeros(b, c, h_ - h, w, device=device)
109
+ image = torch.cat([image, h_padding], dim=2)
110
+ if w_ != w:
111
+ w_padding = torch.zeros(b, c, h_, w_ - w, device=device)
112
+ image = torch.cat([image, w_padding], dim=3)
113
+ # ====================================================
114
+
115
+ scores_map, descriptor_map = super().forward(image)
116
+
117
+ # ====================================================
118
+ if h_ != h or w_ != w:
119
+ descriptor_map = descriptor_map[:, :, :h, :w]
120
+ scores_map = scores_map[:, :, :h, :w] # Bx1xHxW
121
+ # ====================================================
122
+
123
+ # BxCxHxW
124
+ descriptor_map = torch.nn.functional.normalize(descriptor_map, p=2, dim=1)
125
+
126
+ if ret_dict:
127
+ return {
128
+ "descriptor_map": descriptor_map,
129
+ "scores_map": scores_map,
130
+ }
131
+ else:
132
+ return descriptor_map, scores_map
133
+
134
+ def forward(self, img, image_size_max=99999, sort=False, sub_pixel=False):
135
+ """
136
+ :param img: np.array HxWx3, RGB
137
+ :param image_size_max: maximum image size, otherwise, the image will be resized
138
+ :param sort: sort keypoints by scores
139
+ :param sub_pixel: whether to use sub-pixel accuracy
140
+ :return: a dictionary with 'keypoints', 'descriptors', 'scores', and 'time'
141
+ """
142
+ H, W, three = img.shape
143
+ assert three == 3, "input image shape should be [HxWx3]"
144
+
145
+ # ==================== image size constraint
146
+ image = deepcopy(img)
147
+ max_hw = max(H, W)
148
+ if max_hw > image_size_max:
149
+ ratio = float(image_size_max / max_hw)
150
+ image = cv2.resize(image, dsize=None, fx=ratio, fy=ratio)
151
+
152
+ # ==================== convert image to tensor
153
+ image = (
154
+ torch.from_numpy(image)
155
+ .to(self.device)
156
+ .to(torch.float32)
157
+ .permute(2, 0, 1)[None]
158
+ / 255.0
159
+ )
160
+
161
+ # ==================== extract keypoints
162
+ start = time.time()
163
+
164
+ with torch.no_grad():
165
+ descriptor_map, scores_map = self.extract_dense_map(image)
166
+ keypoints, descriptors, scores, _ = self.dkd(
167
+ scores_map, descriptor_map, sub_pixel=sub_pixel
168
+ )
169
+ keypoints, descriptors, scores = keypoints[0], descriptors[0], scores[0]
170
+ keypoints = (keypoints + 1) / 2 * keypoints.new_tensor([[W - 1, H - 1]])
171
+
172
+ if sort:
173
+ indices = torch.argsort(scores, descending=True)
174
+ keypoints = keypoints[indices]
175
+ descriptors = descriptors[indices]
176
+ scores = scores[indices]
177
+
178
+ end = time.time()
179
+
180
+ return {
181
+ "keypoints": keypoints.cpu().numpy(),
182
+ "descriptors": descriptors.cpu().numpy(),
183
+ "scores": scores.cpu().numpy(),
184
+ "scores_map": scores_map.cpu().numpy(),
185
+ "time": end - start,
186
+ }
187
+
188
+
189
+ if __name__ == "__main__":
190
+ import numpy as np
191
+ from thop import profile
192
+
193
+ net = ALike(c1=32, c2=64, c3=128, c4=128, dim=128, single_head=False)
194
+
195
+ image = np.random.random((640, 480, 3)).astype(np.float32)
196
+ flops, params = profile(net, inputs=(image, 9999, False), verbose=False)
197
+ print("{:<30} {:<8} GFLops".format("Computational complexity: ", flops / 1e9))
198
+ print("{:<30} {:<8} KB".format("Number of parameters: ", params / 1e3))
third_party/ALIKE/alnet.py ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from torch import nn
3
+ from torchvision.models import resnet
4
+ from typing import Optional, Callable
5
+
6
+
7
+ class ConvBlock(nn.Module):
8
+ def __init__(
9
+ self,
10
+ in_channels,
11
+ out_channels,
12
+ gate: Optional[Callable[..., nn.Module]] = None,
13
+ norm_layer: Optional[Callable[..., nn.Module]] = None,
14
+ ):
15
+ super().__init__()
16
+ if gate is None:
17
+ self.gate = nn.ReLU(inplace=True)
18
+ else:
19
+ self.gate = gate
20
+ if norm_layer is None:
21
+ norm_layer = nn.BatchNorm2d
22
+ self.conv1 = resnet.conv3x3(in_channels, out_channels)
23
+ self.bn1 = norm_layer(out_channels)
24
+ self.conv2 = resnet.conv3x3(out_channels, out_channels)
25
+ self.bn2 = norm_layer(out_channels)
26
+
27
+ def forward(self, x):
28
+ x = self.gate(self.bn1(self.conv1(x))) # B x in_channels x H x W
29
+ x = self.gate(self.bn2(self.conv2(x))) # B x out_channels x H x W
30
+ return x
31
+
32
+
33
+ # copied from torchvision\models\resnet.py#27->BasicBlock
34
+ class ResBlock(nn.Module):
35
+ expansion: int = 1
36
+
37
+ def __init__(
38
+ self,
39
+ inplanes: int,
40
+ planes: int,
41
+ stride: int = 1,
42
+ downsample: Optional[nn.Module] = None,
43
+ groups: int = 1,
44
+ base_width: int = 64,
45
+ dilation: int = 1,
46
+ gate: Optional[Callable[..., nn.Module]] = None,
47
+ norm_layer: Optional[Callable[..., nn.Module]] = None,
48
+ ) -> None:
49
+ super(ResBlock, self).__init__()
50
+ if gate is None:
51
+ self.gate = nn.ReLU(inplace=True)
52
+ else:
53
+ self.gate = gate
54
+ if norm_layer is None:
55
+ norm_layer = nn.BatchNorm2d
56
+ if groups != 1 or base_width != 64:
57
+ raise ValueError("ResBlock only supports groups=1 and base_width=64")
58
+ if dilation > 1:
59
+ raise NotImplementedError("Dilation > 1 not supported in ResBlock")
60
+ # Both self.conv1 and self.downsample layers downsample the input when stride != 1
61
+ self.conv1 = resnet.conv3x3(inplanes, planes, stride)
62
+ self.bn1 = norm_layer(planes)
63
+ self.conv2 = resnet.conv3x3(planes, planes)
64
+ self.bn2 = norm_layer(planes)
65
+ self.downsample = downsample
66
+ self.stride = stride
67
+
68
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
69
+ identity = x
70
+
71
+ out = self.conv1(x)
72
+ out = self.bn1(out)
73
+ out = self.gate(out)
74
+
75
+ out = self.conv2(out)
76
+ out = self.bn2(out)
77
+
78
+ if self.downsample is not None:
79
+ identity = self.downsample(x)
80
+
81
+ out += identity
82
+ out = self.gate(out)
83
+
84
+ return out
85
+
86
+
87
+ class ALNet(nn.Module):
88
+ def __init__(
89
+ self,
90
+ c1: int = 32,
91
+ c2: int = 64,
92
+ c3: int = 128,
93
+ c4: int = 128,
94
+ dim: int = 128,
95
+ single_head: bool = True,
96
+ ):
97
+ super().__init__()
98
+
99
+ self.gate = nn.ReLU(inplace=True)
100
+
101
+ self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
102
+ self.pool4 = nn.MaxPool2d(kernel_size=4, stride=4)
103
+
104
+ self.block1 = ConvBlock(3, c1, self.gate, nn.BatchNorm2d)
105
+
106
+ self.block2 = ResBlock(
107
+ inplanes=c1,
108
+ planes=c2,
109
+ stride=1,
110
+ downsample=nn.Conv2d(c1, c2, 1),
111
+ gate=self.gate,
112
+ norm_layer=nn.BatchNorm2d,
113
+ )
114
+ self.block3 = ResBlock(
115
+ inplanes=c2,
116
+ planes=c3,
117
+ stride=1,
118
+ downsample=nn.Conv2d(c2, c3, 1),
119
+ gate=self.gate,
120
+ norm_layer=nn.BatchNorm2d,
121
+ )
122
+ self.block4 = ResBlock(
123
+ inplanes=c3,
124
+ planes=c4,
125
+ stride=1,
126
+ downsample=nn.Conv2d(c3, c4, 1),
127
+ gate=self.gate,
128
+ norm_layer=nn.BatchNorm2d,
129
+ )
130
+
131
+ # ================================== feature aggregation
132
+ self.conv1 = resnet.conv1x1(c1, dim // 4)
133
+ self.conv2 = resnet.conv1x1(c2, dim // 4)
134
+ self.conv3 = resnet.conv1x1(c3, dim // 4)
135
+ self.conv4 = resnet.conv1x1(dim, dim // 4)
136
+ self.upsample2 = nn.Upsample(
137
+ scale_factor=2, mode="bilinear", align_corners=True
138
+ )
139
+ self.upsample4 = nn.Upsample(
140
+ scale_factor=4, mode="bilinear", align_corners=True
141
+ )
142
+ self.upsample8 = nn.Upsample(
143
+ scale_factor=8, mode="bilinear", align_corners=True
144
+ )
145
+ self.upsample32 = nn.Upsample(
146
+ scale_factor=32, mode="bilinear", align_corners=True
147
+ )
148
+
149
+ # ================================== detector and descriptor head
150
+ self.single_head = single_head
151
+ if not self.single_head:
152
+ self.convhead1 = resnet.conv1x1(dim, dim)
153
+ self.convhead2 = resnet.conv1x1(dim, dim + 1)
154
+
155
+ def forward(self, image):
156
+ # ================================== feature encoder
157
+ x1 = self.block1(image) # B x c1 x H x W
158
+ x2 = self.pool2(x1)
159
+ x2 = self.block2(x2) # B x c2 x H/2 x W/2
160
+ x3 = self.pool4(x2)
161
+ x3 = self.block3(x3) # B x c3 x H/8 x W/8
162
+ x4 = self.pool4(x3)
163
+ x4 = self.block4(x4) # B x dim x H/32 x W/32
164
+
165
+ # ================================== feature aggregation
166
+ x1 = self.gate(self.conv1(x1)) # B x dim//4 x H x W
167
+ x2 = self.gate(self.conv2(x2)) # B x dim//4 x H//2 x W//2
168
+ x3 = self.gate(self.conv3(x3)) # B x dim//4 x H//8 x W//8
169
+ x4 = self.gate(self.conv4(x4)) # B x dim//4 x H//32 x W//32
170
+ x2_up = self.upsample2(x2) # B x dim//4 x H x W
171
+ x3_up = self.upsample8(x3) # B x dim//4 x H x W
172
+ x4_up = self.upsample32(x4) # B x dim//4 x H x W
173
+ x1234 = torch.cat([x1, x2_up, x3_up, x4_up], dim=1)
174
+
175
+ # ================================== detector and descriptor head
176
+ if not self.single_head:
177
+ x1234 = self.gate(self.convhead1(x1234))
178
+ x = self.convhead2(x1234) # B x dim+1 x H x W
179
+
180
+ descriptor_map = x[:, :-1, :, :]
181
+ scores_map = torch.sigmoid(x[:, -1, :, :]).unsqueeze(1)
182
+
183
+ return scores_map, descriptor_map
184
+
185
+
186
+ if __name__ == "__main__":
187
+ from thop import profile
188
+
189
+ net = ALNet(c1=16, c2=32, c3=64, c4=128, dim=128, single_head=True)
190
+
191
+ image = torch.randn(1, 3, 640, 480)
192
+ flops, params = profile(net, inputs=(image,), verbose=False)
193
+ print("{:<30} {:<8} GFLops".format("Computational complexity: ", flops / 1e9))
194
+ print("{:<30} {:<8} KB".format("Number of parameters: ", params / 1e3))
third_party/ALIKE/demo.py ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import copy
2
+ import os
3
+ import cv2
4
+ import glob
5
+ import logging
6
+ import argparse
7
+ import numpy as np
8
+ from tqdm import tqdm
9
+ from alike import ALike, configs
10
+
11
+
12
+ class ImageLoader(object):
13
+ def __init__(self, filepath: str):
14
+ self.N = 3000
15
+ if filepath.startswith("camera"):
16
+ camera = int(filepath[6:])
17
+ self.cap = cv2.VideoCapture(camera)
18
+ if not self.cap.isOpened():
19
+ raise IOError(f"Can't open camera {camera}!")
20
+ logging.info(f"Opened camera {camera}")
21
+ self.mode = "camera"
22
+ elif os.path.exists(filepath):
23
+ if os.path.isfile(filepath):
24
+ self.cap = cv2.VideoCapture(filepath)
25
+ if not self.cap.isOpened():
26
+ raise IOError(f"Can't open video {filepath}!")
27
+ rate = self.cap.get(cv2.CAP_PROP_FPS)
28
+ self.N = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT)) - 1
29
+ duration = self.N / rate
30
+ logging.info(f"Opened video {filepath}")
31
+ logging.info(f"Frames: {self.N}, FPS: {rate}, Duration: {duration}s")
32
+ self.mode = "video"
33
+ else:
34
+ self.images = (
35
+ glob.glob(os.path.join(filepath, "*.png"))
36
+ + glob.glob(os.path.join(filepath, "*.jpg"))
37
+ + glob.glob(os.path.join(filepath, "*.ppm"))
38
+ )
39
+ self.images.sort()
40
+ self.N = len(self.images)
41
+ logging.info(f"Loading {self.N} images")
42
+ self.mode = "images"
43
+ else:
44
+ raise IOError(
45
+ "Error filepath (camerax/path of images/path of videos): ", filepath
46
+ )
47
+
48
+ def __getitem__(self, item):
49
+ if self.mode == "camera" or self.mode == "video":
50
+ if item > self.N:
51
+ return None
52
+ ret, img = self.cap.read()
53
+ if not ret:
54
+ raise "Can't read image from camera"
55
+ if self.mode == "video":
56
+ self.cap.set(cv2.CAP_PROP_POS_FRAMES, item)
57
+ elif self.mode == "images":
58
+ filename = self.images[item]
59
+ img = cv2.imread(filename)
60
+ if img is None:
61
+ raise Exception("Error reading image %s" % filename)
62
+ return img
63
+
64
+ def __len__(self):
65
+ return self.N
66
+
67
+
68
+ class SimpleTracker(object):
69
+ def __init__(self):
70
+ self.pts_prev = None
71
+ self.desc_prev = None
72
+
73
+ def update(self, img, pts, desc):
74
+ N_matches = 0
75
+ if self.pts_prev is None:
76
+ self.pts_prev = pts
77
+ self.desc_prev = desc
78
+
79
+ out = copy.deepcopy(img)
80
+ for pt1 in pts:
81
+ p1 = (int(round(pt1[0])), int(round(pt1[1])))
82
+ cv2.circle(out, p1, 1, (0, 0, 255), -1, lineType=16)
83
+ else:
84
+ matches = self.mnn_mather(self.desc_prev, desc)
85
+ mpts1, mpts2 = self.pts_prev[matches[:, 0]], pts[matches[:, 1]]
86
+ N_matches = len(matches)
87
+
88
+ out = copy.deepcopy(img)
89
+ for pt1, pt2 in zip(mpts1, mpts2):
90
+ p1 = (int(round(pt1[0])), int(round(pt1[1])))
91
+ p2 = (int(round(pt2[0])), int(round(pt2[1])))
92
+ cv2.line(out, p1, p2, (0, 255, 0), lineType=16)
93
+ cv2.circle(out, p2, 1, (0, 0, 255), -1, lineType=16)
94
+
95
+ self.pts_prev = pts
96
+ self.desc_prev = desc
97
+
98
+ return out, N_matches
99
+
100
+ def mnn_mather(self, desc1, desc2):
101
+ sim = desc1 @ desc2.transpose()
102
+ sim[sim < 0.9] = 0
103
+ nn12 = np.argmax(sim, axis=1)
104
+ nn21 = np.argmax(sim, axis=0)
105
+ ids1 = np.arange(0, sim.shape[0])
106
+ mask = ids1 == nn21[nn12]
107
+ matches = np.stack([ids1[mask], nn12[mask]])
108
+ return matches.transpose()
109
+
110
+
111
+ if __name__ == "__main__":
112
+ parser = argparse.ArgumentParser(description="ALike Demo.")
113
+ parser.add_argument(
114
+ "input",
115
+ type=str,
116
+ default="",
117
+ help='Image directory or movie file or "camera0" (for webcam0).',
118
+ )
119
+ parser.add_argument(
120
+ "--model",
121
+ choices=["alike-t", "alike-s", "alike-n", "alike-l"],
122
+ default="alike-t",
123
+ help="The model configuration",
124
+ )
125
+ parser.add_argument(
126
+ "--device", type=str, default="cuda", help="Running device (default: cuda)."
127
+ )
128
+ parser.add_argument(
129
+ "--top_k",
130
+ type=int,
131
+ default=-1,
132
+ help="Detect top K keypoints. -1 for threshold based mode, >0 for top K mode. (default: -1)",
133
+ )
134
+ parser.add_argument(
135
+ "--scores_th",
136
+ type=float,
137
+ default=0.2,
138
+ help="Detector score threshold (default: 0.2).",
139
+ )
140
+ parser.add_argument(
141
+ "--n_limit",
142
+ type=int,
143
+ default=5000,
144
+ help="Maximum number of keypoints to be detected (default: 5000).",
145
+ )
146
+ parser.add_argument(
147
+ "--no_display",
148
+ action="store_true",
149
+ help="Do not display images to screen. Useful if running remotely (default: False).",
150
+ )
151
+ parser.add_argument(
152
+ "--no_sub_pixel",
153
+ action="store_true",
154
+ help="Do not detect sub-pixel keypoints (default: False).",
155
+ )
156
+ args = parser.parse_args()
157
+
158
+ logging.basicConfig(level=logging.INFO)
159
+
160
+ image_loader = ImageLoader(args.input)
161
+ model = ALike(
162
+ **configs[args.model],
163
+ device=args.device,
164
+ top_k=args.top_k,
165
+ scores_th=args.scores_th,
166
+ n_limit=args.n_limit,
167
+ )
168
+ tracker = SimpleTracker()
169
+
170
+ if not args.no_display:
171
+ logging.info("Press 'q' to stop!")
172
+ cv2.namedWindow(args.model)
173
+
174
+ runtime = []
175
+ progress_bar = tqdm(image_loader)
176
+ for img in progress_bar:
177
+ if img is None:
178
+ break
179
+
180
+ img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
181
+ pred = model(img_rgb, sub_pixel=not args.no_sub_pixel)
182
+ kpts = pred["keypoints"]
183
+ desc = pred["descriptors"]
184
+ runtime.append(pred["time"])
185
+
186
+ out, N_matches = tracker.update(img, kpts, desc)
187
+
188
+ ave_fps = (1.0 / np.stack(runtime)).mean()
189
+ status = f"Fps:{ave_fps:.1f}, Keypoints/Matches: {len(kpts)}/{N_matches}"
190
+ progress_bar.set_description(status)
191
+
192
+ if not args.no_display:
193
+ cv2.setWindowTitle(args.model, args.model + ": " + status)
194
+ cv2.imshow(args.model, out)
195
+ if cv2.waitKey(1) == ord("q"):
196
+ break
197
+
198
+ logging.info("Finished!")
199
+ if not args.no_display:
200
+ logging.info("Press any key to exit!")
201
+ cv2.waitKey()
third_party/ALIKE/hseq/cache/alike-l-ms.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1350ab826afdd9b7542a556e2fda9ad9f94388a875c8edb7874e4bcdfebc63ca
3
+ size 13124
third_party/ALIKE/hseq/cache/alike-l.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:999daff1155f3d4736bb7374fb2058f520b0cb4c75b5d7d87fc1e7025a7d2a7d
3
+ size 13124
third_party/ALIKE/hseq/cache/alike-n-ms.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e5967048eddb61e423bf2ea05a2a626e18d8a716b6a0ad42471059aec0b934c
3
+ size 13124
third_party/ALIKE/hseq/cache/alike-n.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e2eba5ff96b25d0a100b6c7273549de91586e6069dcb5320a20edbb24ea462e
3
+ size 13124
third_party/ALIKE/hseq/cache/aslfeat.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce06fd1b6265e09ed3b26768b68f624e2d556358ab98addd8ebdb7a5a076abe8
3
+ size 15352
third_party/ALIKE/hseq/cache/d2.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:976d81c6b51a98f89eac60c6d25990130c1df571ef6536280f4b00577eab56f0
3
+ size 15352
third_party/ALIKE/hseq/cache/disk.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df2d9e0dfd0baa19f2af12f4604368ca65a1643159e7e3438e25efc41ab15357
3
+ size 15352
third_party/ALIKE/hseq/cache/lfnet.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:417327dee726cffccc6dfbc9b0e6b3c06b277ea8878ccf87b87475d1cd6e65ca
3
+ size 15352
third_party/ALIKE/hseq/cache/r2d2.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1375a21adcc932db2c9e210e52f633c1903cca6d37066391eb9d645ff87d0120
3
+ size 15352
third_party/ALIKE/hseq/cache/superpoint.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e4d4a4ca79518af47467e9ddd69fe159c9305a580dadc4fdab6ffde6f8b48c2
3
+ size 15352
third_party/ALIKE/hseq/eval.py ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import os
3
+ from tqdm import tqdm
4
+ import torch
5
+ import numpy as np
6
+ from extract import extract_method
7
+
8
+ use_cuda = torch.cuda.is_available()
9
+ device = torch.device("cuda" if use_cuda else "cpu")
10
+
11
+ methods = [
12
+ "d2",
13
+ "lfnet",
14
+ "superpoint",
15
+ "r2d2",
16
+ "aslfeat",
17
+ "disk",
18
+ "alike-n",
19
+ "alike-l",
20
+ "alike-n-ms",
21
+ "alike-l-ms",
22
+ ]
23
+ names = [
24
+ "D2-Net(MS)",
25
+ "LF-Net(MS)",
26
+ "SuperPoint",
27
+ "R2D2(MS)",
28
+ "ASLFeat(MS)",
29
+ "DISK",
30
+ "ALike-N",
31
+ "ALike-L",
32
+ "ALike-N(MS)",
33
+ "ALike-L(MS)",
34
+ ]
35
+
36
+ top_k = None
37
+ n_i = 52
38
+ n_v = 56
39
+ cache_dir = "hseq/cache"
40
+ dataset_path = "hseq/hpatches-sequences-release"
41
+
42
+
43
+ def generate_read_function(method, extension="ppm"):
44
+ def read_function(seq_name, im_idx):
45
+ aux = np.load(
46
+ os.path.join(
47
+ dataset_path, seq_name, "%d.%s.%s" % (im_idx, extension, method)
48
+ )
49
+ )
50
+ if top_k is None:
51
+ return aux["keypoints"], aux["descriptors"]
52
+ else:
53
+ assert "scores" in aux
54
+ ids = np.argsort(aux["scores"])[-top_k:]
55
+ return aux["keypoints"][ids, :], aux["descriptors"][ids, :]
56
+
57
+ return read_function
58
+
59
+
60
+ def mnn_matcher(descriptors_a, descriptors_b):
61
+ device = descriptors_a.device
62
+ sim = descriptors_a @ descriptors_b.t()
63
+ nn12 = torch.max(sim, dim=1)[1]
64
+ nn21 = torch.max(sim, dim=0)[1]
65
+ ids1 = torch.arange(0, sim.shape[0], device=device)
66
+ mask = ids1 == nn21[nn12]
67
+ matches = torch.stack([ids1[mask], nn12[mask]])
68
+ return matches.t().data.cpu().numpy()
69
+
70
+
71
+ def homo_trans(coord, H):
72
+ kpt_num = coord.shape[0]
73
+ homo_coord = np.concatenate((coord, np.ones((kpt_num, 1))), axis=-1)
74
+ proj_coord = np.matmul(H, homo_coord.T).T
75
+ proj_coord = proj_coord / proj_coord[:, 2][..., None]
76
+ proj_coord = proj_coord[:, 0:2]
77
+ return proj_coord
78
+
79
+
80
+ def benchmark_features(read_feats):
81
+ lim = [1, 5]
82
+ rng = np.arange(lim[0], lim[1] + 1)
83
+
84
+ seq_names = sorted(os.listdir(dataset_path))
85
+
86
+ n_feats = []
87
+ n_matches = []
88
+ seq_type = []
89
+ i_err = {thr: 0 for thr in rng}
90
+ v_err = {thr: 0 for thr in rng}
91
+
92
+ i_err_homo = {thr: 0 for thr in rng}
93
+ v_err_homo = {thr: 0 for thr in rng}
94
+
95
+ for seq_idx, seq_name in tqdm(enumerate(seq_names), total=len(seq_names)):
96
+ keypoints_a, descriptors_a = read_feats(seq_name, 1)
97
+ n_feats.append(keypoints_a.shape[0])
98
+
99
+ # =========== compute homography
100
+ ref_img = cv2.imread(os.path.join(dataset_path, seq_name, "1.ppm"))
101
+ ref_img_shape = ref_img.shape
102
+
103
+ for im_idx in range(2, 7):
104
+ keypoints_b, descriptors_b = read_feats(seq_name, im_idx)
105
+ n_feats.append(keypoints_b.shape[0])
106
+
107
+ matches = mnn_matcher(
108
+ torch.from_numpy(descriptors_a).to(device=device),
109
+ torch.from_numpy(descriptors_b).to(device=device),
110
+ )
111
+
112
+ homography = np.loadtxt(
113
+ os.path.join(dataset_path, seq_name, "H_1_" + str(im_idx))
114
+ )
115
+
116
+ pos_a = keypoints_a[matches[:, 0], :2]
117
+ pos_a_h = np.concatenate([pos_a, np.ones([matches.shape[0], 1])], axis=1)
118
+ pos_b_proj_h = np.transpose(np.dot(homography, np.transpose(pos_a_h)))
119
+ pos_b_proj = pos_b_proj_h[:, :2] / pos_b_proj_h[:, 2:]
120
+
121
+ pos_b = keypoints_b[matches[:, 1], :2]
122
+
123
+ dist = np.sqrt(np.sum((pos_b - pos_b_proj) ** 2, axis=1))
124
+
125
+ n_matches.append(matches.shape[0])
126
+ seq_type.append(seq_name[0])
127
+
128
+ if dist.shape[0] == 0:
129
+ dist = np.array([float("inf")])
130
+
131
+ for thr in rng:
132
+ if seq_name[0] == "i":
133
+ i_err[thr] += np.mean(dist <= thr)
134
+ else:
135
+ v_err[thr] += np.mean(dist <= thr)
136
+
137
+ # =========== compute homography
138
+ gt_homo = homography
139
+ pred_homo, _ = cv2.findHomography(
140
+ keypoints_a[matches[:, 0], :2],
141
+ keypoints_b[matches[:, 1], :2],
142
+ cv2.RANSAC,
143
+ )
144
+ if pred_homo is None:
145
+ homo_dist = np.array([float("inf")])
146
+ else:
147
+ corners = np.array(
148
+ [
149
+ [0, 0],
150
+ [ref_img_shape[1] - 1, 0],
151
+ [0, ref_img_shape[0] - 1],
152
+ [ref_img_shape[1] - 1, ref_img_shape[0] - 1],
153
+ ]
154
+ )
155
+ real_warped_corners = homo_trans(corners, gt_homo)
156
+ warped_corners = homo_trans(corners, pred_homo)
157
+ homo_dist = np.mean(
158
+ np.linalg.norm(real_warped_corners - warped_corners, axis=1)
159
+ )
160
+
161
+ for thr in rng:
162
+ if seq_name[0] == "i":
163
+ i_err_homo[thr] += np.mean(homo_dist <= thr)
164
+ else:
165
+ v_err_homo[thr] += np.mean(homo_dist <= thr)
166
+
167
+ seq_type = np.array(seq_type)
168
+ n_feats = np.array(n_feats)
169
+ n_matches = np.array(n_matches)
170
+
171
+ return i_err, v_err, i_err_homo, v_err_homo, [seq_type, n_feats, n_matches]
172
+
173
+
174
+ if __name__ == "__main__":
175
+ errors = {}
176
+ for method in methods:
177
+ output_file = os.path.join(cache_dir, method + ".npy")
178
+ read_function = generate_read_function(method)
179
+ if os.path.exists(output_file):
180
+ errors[method] = np.load(output_file, allow_pickle=True)
181
+ else:
182
+ extract_method(method)
183
+ errors[method] = benchmark_features(read_function)
184
+ np.save(output_file, errors[method])
185
+
186
+ for name, method in zip(names, methods):
187
+ i_err, v_err, i_err_hom, v_err_hom, _ = errors[method]
188
+
189
+ print(f"====={name}=====")
190
+ print(f"MMA@1 MMA@2 MMA@3 MHA@1 MHA@2 MHA@3: ", end="")
191
+ for thr in range(1, 4):
192
+ err = (i_err[thr] + v_err[thr]) / ((n_i + n_v) * 5)
193
+ print(f"{err * 100:.2f}%", end=" ")
194
+ for thr in range(1, 4):
195
+ err_hom = (i_err_hom[thr] + v_err_hom[thr]) / ((n_i + n_v) * 5)
196
+ print(f"{err_hom * 100:.2f}%", end=" ")
197
+ print("")
third_party/ALIKE/hseq/extract.py ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ import cv2
4
+ from pathlib import Path
5
+ import numpy as np
6
+ import torch
7
+ import torch.utils.data as data
8
+ from tqdm import tqdm
9
+ from copy import deepcopy
10
+ from torchvision.transforms import ToTensor
11
+
12
+ sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
13
+ from alike import ALike, configs
14
+
15
+ dataset_root = "hseq/hpatches-sequences-release"
16
+ use_cuda = torch.cuda.is_available()
17
+ device = "cuda" if use_cuda else "cpu"
18
+ methods = ["alike-n", "alike-l", "alike-n-ms", "alike-l-ms"]
19
+
20
+
21
+ class HPatchesDataset(data.Dataset):
22
+ def __init__(self, root: str = dataset_root, alteration: str = "all"):
23
+ """
24
+ Args:
25
+ root: dataset root path
26
+ alteration: # 'all', 'i' for illumination or 'v' for viewpoint
27
+ """
28
+ assert Path(root).exists(), f"Dataset root path {root} dose not exist!"
29
+ self.root = root
30
+
31
+ # get all image file name
32
+ self.image0_list = []
33
+ self.image1_list = []
34
+ self.homographies = []
35
+ folders = [x for x in Path(self.root).iterdir() if x.is_dir()]
36
+ self.seqs = []
37
+ for folder in folders:
38
+ if alteration == "i" and folder.stem[0] != "i":
39
+ continue
40
+ if alteration == "v" and folder.stem[0] != "v":
41
+ continue
42
+
43
+ self.seqs.append(folder)
44
+
45
+ self.len = len(self.seqs)
46
+ assert self.len > 0, f"Can not find PatchDataset in path {self.root}"
47
+
48
+ def __getitem__(self, item):
49
+ folder = self.seqs[item]
50
+
51
+ imgs = []
52
+ homos = []
53
+ for i in range(1, 7):
54
+ img = cv2.imread(str(folder / f"{i}.ppm"), cv2.IMREAD_COLOR)
55
+ img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # HxWxC
56
+ imgs.append(img)
57
+
58
+ if i != 1:
59
+ homo = np.loadtxt(str(folder / f"H_1_{i}")).astype("float32")
60
+ homos.append(homo)
61
+
62
+ return imgs, homos, folder.stem
63
+
64
+ def __len__(self):
65
+ return self.len
66
+
67
+ def name(self):
68
+ return self.__class__
69
+
70
+
71
+ def extract_multiscale(
72
+ model,
73
+ img,
74
+ scale_f=2**0.5,
75
+ min_scale=1.0,
76
+ max_scale=1.0,
77
+ min_size=0.0,
78
+ max_size=99999.0,
79
+ image_size_max=99999,
80
+ n_k=0,
81
+ sort=False,
82
+ ):
83
+ H_, W_, three = img.shape
84
+ assert three == 3, "input image shape should be [HxWx3]"
85
+
86
+ old_bm = torch.backends.cudnn.benchmark
87
+ torch.backends.cudnn.benchmark = False # speedup
88
+
89
+ # ==================== image size constraint
90
+ image = deepcopy(img)
91
+ max_hw = max(H_, W_)
92
+ if max_hw > image_size_max:
93
+ ratio = float(image_size_max / max_hw)
94
+ image = cv2.resize(image, dsize=None, fx=ratio, fy=ratio)
95
+
96
+ # ==================== convert image to tensor
97
+ H, W, three = image.shape
98
+ image = ToTensor()(image).unsqueeze(0)
99
+ image = image.to(device)
100
+
101
+ s = 1.0 # current scale factor
102
+ keypoints, descriptors, scores, scores_maps, descriptor_maps = [], [], [], [], []
103
+ while s + 0.001 >= max(min_scale, min_size / max(H, W)):
104
+ if s - 0.001 <= min(max_scale, max_size / max(H, W)):
105
+ nh, nw = image.shape[2:]
106
+
107
+ # extract descriptors
108
+ with torch.no_grad():
109
+ descriptor_map, scores_map = model.extract_dense_map(image)
110
+ keypoints_, descriptors_, scores_, _ = model.dkd(
111
+ scores_map, descriptor_map
112
+ )
113
+
114
+ keypoints.append(keypoints_[0])
115
+ descriptors.append(descriptors_[0])
116
+ scores.append(scores_[0])
117
+
118
+ s /= scale_f
119
+
120
+ # down-scale the image for next iteration
121
+ nh, nw = round(H * s), round(W * s)
122
+ image = torch.nn.functional.interpolate(
123
+ image, (nh, nw), mode="bilinear", align_corners=False
124
+ )
125
+
126
+ # restore value
127
+ torch.backends.cudnn.benchmark = old_bm
128
+
129
+ keypoints = torch.cat(keypoints)
130
+ descriptors = torch.cat(descriptors)
131
+ scores = torch.cat(scores)
132
+ keypoints = (keypoints + 1) / 2 * keypoints.new_tensor([[W_ - 1, H_ - 1]])
133
+
134
+ if sort or 0 < n_k < len(keypoints):
135
+ indices = torch.argsort(scores, descending=True)
136
+ keypoints = keypoints[indices]
137
+ descriptors = descriptors[indices]
138
+ scores = scores[indices]
139
+
140
+ if 0 < n_k < len(keypoints):
141
+ keypoints = keypoints[0:n_k]
142
+ descriptors = descriptors[0:n_k]
143
+ scores = scores[0:n_k]
144
+
145
+ return {"keypoints": keypoints, "descriptors": descriptors, "scores": scores}
146
+
147
+
148
+ def extract_method(m):
149
+ hpatches = HPatchesDataset(root=dataset_root, alteration="all")
150
+ model = m[:7]
151
+ min_scale = 0.3 if m[8:] == "ms" else 1.0
152
+
153
+ model = ALike(**configs[model], device=device, top_k=0, scores_th=0.2, n_limit=5000)
154
+
155
+ progbar = tqdm(hpatches, desc="Extracting for {}".format(m))
156
+ for imgs, homos, seq_name in progbar:
157
+ for i in range(1, 7):
158
+ img = imgs[i - 1]
159
+ pred = extract_multiscale(
160
+ model, img, min_scale=min_scale, max_scale=1, sort=False, n_k=5000
161
+ )
162
+ kpts, descs, scores = pred["keypoints"], pred["descriptors"], pred["scores"]
163
+
164
+ with open(os.path.join(dataset_root, seq_name, f"{i}.ppm.{m}"), "wb") as f:
165
+ np.savez(
166
+ f,
167
+ keypoints=kpts.cpu().numpy(),
168
+ scores=scores.cpu().numpy(),
169
+ descriptors=descs.cpu().numpy(),
170
+ )
171
+
172
+
173
+ if __name__ == "__main__":
174
+ for method in methods:
175
+ extract_method(method)
third_party/ALIKE/matlab/createfigure.m ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ function createfigure(X1, YMatrix1, Y1, l1, l2, l3)
2
+ %CREATEFIGURE(X1, YMatrix1, Y1)
3
+ % X1: vector of x data
4
+ % YMATRIX1: matrix of y data
5
+ % Y1: vector of y data
6
+
7
+ % Auto-generated by MATLAB on 29-Oct-2021 15:42:14
8
+
9
+ % Create figure
10
+ figure1 = figure;
11
+
12
+ % Create axes
13
+ axes1 = axes('Parent',figure1);
14
+ hold(axes1,'on');
15
+
16
+ % Create multiple lines using matrix input to plot
17
+ plot1 = plot(X1,YMatrix1,'Parent',axes1,'LineWidth',1);
18
+ set(plot1(1),'LineStyle','-.','Color',[1 0 0]);
19
+ set(plot1(2),'Color',[0 1 0]);
20
+ set(plot1(3),'LineStyle','--',...
21
+ 'Color',[0.87058824300766 0.490196079015732 0]);
22
+
23
+ % Uncomment the following line to preserve the X-limits of the axes
24
+ % xlim(axes1,[-1.1 1.1]);
25
+ % Uncomment the following line to preserve the Y-limits of the axes
26
+ ylim(axes1,[0 2.2]);
27
+ box(axes1,'on');
28
+ hold(axes1,'off');
29
+ % Set the remaining axes properties
30
+ set(axes1,'XColor',[0 0 0],'YColor',[0 0 0],'YTick',[0 0.5 1 1.5 2 2.5]);
31
+ % Create axes
32
+ axes2 = axes('Parent',figure1);
33
+ hold(axes2,'on');
34
+ colororder([0.494 0.184 0.556;0.466 0.674 0.188;0.301 0.745 0.933;0.635 0.078 0.184;0 0.447 0.741;0.85 0.325 0.098;0.929 0.694 0.125]);
35
+
36
+ % Create plot
37
+ plot(X1,Y1,'Parent',axes2,'LineWidth',1,'LineStyle',':','Color',[0 0 1]);
38
+
39
+ % Uncomment the following line to preserve the X-limits of the axes
40
+ % xlim(axes2,[-1.1 1.1]);
41
+ % Uncomment the following line to preserve the Y-limits of the axes
42
+ ylim(axes2,[0 1.6]);
43
+ hold(axes2,'off');
44
+ % Set the remaining axes properties
45
+ set(axes2,'Color','none','HitTest','off','XColor',[0 0 0],'YAxisLocation',...
46
+ 'right','YColor',[0 0 0],'YTick',[0 0.5 1 1.5]);
47
+ % Create textbox
48
+ annotation(figure1,'textbox',...
49
+ [0.255427607968038,0.605539475745798,0.304947448327989,0.235148519909872],...
50
+ 'Color',[0.8 0 0],...
51
+ 'String',{sprintf('peak loss=%.4f',l1)},...
52
+ 'EdgeColor','none');
53
+
54
+ % Create textbox
55
+ annotation(figure1,'textbox',...
56
+ [0.631790371410027,0.083530640355914,0.178879315581032,0.235148519909871],...
57
+ 'Color',[0 0 1],...
58
+ 'String',{'keypoint'},...
59
+ 'EdgeColor','none');
60
+
61
+ % Create textbox
62
+ annotation(figure1,'textbox',...
63
+ [0.59663112557549,0.640686239621974,0.318247136419826,0.22093023731067],...
64
+ 'Color',[0 0.498039215803146 0],...
65
+ 'String',{sprintf('peak loss=%.4f',l2)},...
66
+ 'EdgeColor','none');
67
+
68
+ % Create textbox
69
+ annotation(figure1,'textbox',...
70
+ [0.595423071596731,0.415858983920567,0.318247136419826,0.235148519909871],...
71
+ 'Color',[0.87058824300766 0.490196079015732 0],...
72
+ 'String',{sprintf('peak loss=%.4f',l3)},...
73
+ 'FitBoxToText','off',...
74
+ 'EdgeColor','none');
75
+
third_party/ALIKE/matlab/peakloss_rect.m ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ clear;
2
+ close all;
3
+
4
+ x = -1:0.01:1;
5
+
6
+ p0 = 0.5;
7
+ p1 = -0.5;
8
+
9
+ d = abs(x - p0);
10
+
11
+ c0 = 2 .* (x>=-0.75 & x <= -0.25);
12
+ c1 = 2 .* (x>=0.25 & x <= 0.75);
13
+ c2 = 1.25 .* (x>=0.1 & x <= 0.9);
14
+
15
+ peak_loss0 = sum(d.*c0) / length(x)
16
+ peak_loss1 = sum(d.*c1) / length(x)
17
+ peak_loss2 = sum(d.*c2) / length(x)
18
+
19
+ createfigure(x, [c0;c1;c2], d, peak_loss0,peak_loss1, peak_loss2);
third_party/ALIKE/requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ opencv-python~=4.5.1.48
2
+ numpy~=1.19.5
3
+ tqdm~=4.60.0
4
+ torch~=1.8.0
5
+ torchvision~=0.9.0
6
+ thop~=0.0.31-2005241907
third_party/ALIKE/soft_detect.py ADDED
@@ -0,0 +1,234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from torch import nn
3
+ import torch.nn.functional as F
4
+
5
+
6
+ # coordinates system
7
+ # ------------------------------> [ x: range=-1.0~1.0; w: range=0~W ]
8
+ # | -----------------------------
9
+ # | | |
10
+ # | | |
11
+ # | | |
12
+ # | | image |
13
+ # | | |
14
+ # | | |
15
+ # | | |
16
+ # | |---------------------------|
17
+ # v
18
+ # [ y: range=-1.0~1.0; h: range=0~H ]
19
+
20
+
21
+ def simple_nms(scores, nms_radius: int):
22
+ """Fast Non-maximum suppression to remove nearby points"""
23
+ assert nms_radius >= 0
24
+
25
+ def max_pool(x):
26
+ return torch.nn.functional.max_pool2d(
27
+ x, kernel_size=nms_radius * 2 + 1, stride=1, padding=nms_radius
28
+ )
29
+
30
+ zeros = torch.zeros_like(scores)
31
+ max_mask = scores == max_pool(scores)
32
+
33
+ for _ in range(2):
34
+ supp_mask = max_pool(max_mask.float()) > 0
35
+ supp_scores = torch.where(supp_mask, zeros, scores)
36
+ new_max_mask = supp_scores == max_pool(supp_scores)
37
+ max_mask = max_mask | (new_max_mask & (~supp_mask))
38
+ return torch.where(max_mask, scores, zeros)
39
+
40
+
41
+ def sample_descriptor(descriptor_map, kpts, bilinear_interp=False):
42
+ """
43
+ :param descriptor_map: BxCxHxW
44
+ :param kpts: list, len=B, each is Nx2 (keypoints) [h,w]
45
+ :param bilinear_interp: bool, whether to use bilinear interpolation
46
+ :return: descriptors: list, len=B, each is NxD
47
+ """
48
+ batch_size, channel, height, width = descriptor_map.shape
49
+
50
+ descriptors = []
51
+ for index in range(batch_size):
52
+ kptsi = kpts[index] # Nx2,(x,y)
53
+
54
+ if bilinear_interp:
55
+ descriptors_ = torch.nn.functional.grid_sample(
56
+ descriptor_map[index].unsqueeze(0),
57
+ kptsi.view(1, 1, -1, 2),
58
+ mode="bilinear",
59
+ align_corners=True,
60
+ )[
61
+ 0, :, 0, :
62
+ ] # CxN
63
+ else:
64
+ kptsi = (kptsi + 1) / 2 * kptsi.new_tensor([[width - 1, height - 1]])
65
+ kptsi = kptsi.long()
66
+ descriptors_ = descriptor_map[index, :, kptsi[:, 1], kptsi[:, 0]] # CxN
67
+
68
+ descriptors_ = torch.nn.functional.normalize(descriptors_, p=2, dim=0)
69
+ descriptors.append(descriptors_.t())
70
+
71
+ return descriptors
72
+
73
+
74
+ class DKD(nn.Module):
75
+ def __init__(self, radius=2, top_k=0, scores_th=0.2, n_limit=20000):
76
+ """
77
+ Args:
78
+ radius: soft detection radius, kernel size is (2 * radius + 1)
79
+ top_k: top_k > 0: return top k keypoints
80
+ scores_th: top_k <= 0 threshold mode: scores_th > 0: return keypoints with scores>scores_th
81
+ else: return keypoints with scores > scores.mean()
82
+ n_limit: max number of keypoint in threshold mode
83
+ """
84
+ super().__init__()
85
+ self.radius = radius
86
+ self.top_k = top_k
87
+ self.scores_th = scores_th
88
+ self.n_limit = n_limit
89
+ self.kernel_size = 2 * self.radius + 1
90
+ self.temperature = 0.1 # tuned temperature
91
+ self.unfold = nn.Unfold(kernel_size=self.kernel_size, padding=self.radius)
92
+
93
+ # local xy grid
94
+ x = torch.linspace(-self.radius, self.radius, self.kernel_size)
95
+ # (kernel_size*kernel_size) x 2 : (w,h)
96
+ self.hw_grid = torch.stack(torch.meshgrid([x, x])).view(2, -1).t()[:, [1, 0]]
97
+
98
+ def detect_keypoints(self, scores_map, sub_pixel=True):
99
+ b, c, h, w = scores_map.shape
100
+ scores_nograd = scores_map.detach()
101
+ # nms_scores = simple_nms(scores_nograd, self.radius)
102
+ nms_scores = simple_nms(scores_nograd, 2)
103
+
104
+ # remove border
105
+ nms_scores[:, :, : self.radius + 1, :] = 0
106
+ nms_scores[:, :, :, : self.radius + 1] = 0
107
+ nms_scores[:, :, h - self.radius :, :] = 0
108
+ nms_scores[:, :, :, w - self.radius :] = 0
109
+
110
+ # detect keypoints without grad
111
+ if self.top_k > 0:
112
+ topk = torch.topk(nms_scores.view(b, -1), self.top_k)
113
+ indices_keypoints = topk.indices # B x top_k
114
+ else:
115
+ if self.scores_th > 0:
116
+ masks = nms_scores > self.scores_th
117
+ if masks.sum() == 0:
118
+ th = scores_nograd.reshape(b, -1).mean(dim=1) # th = self.scores_th
119
+ masks = nms_scores > th.reshape(b, 1, 1, 1)
120
+ else:
121
+ th = scores_nograd.reshape(b, -1).mean(dim=1) # th = self.scores_th
122
+ masks = nms_scores > th.reshape(b, 1, 1, 1)
123
+ masks = masks.reshape(b, -1)
124
+
125
+ indices_keypoints = [] # list, B x (any size)
126
+ scores_view = scores_nograd.reshape(b, -1)
127
+ for mask, scores in zip(masks, scores_view):
128
+ indices = mask.nonzero(as_tuple=False)[:, 0]
129
+ if len(indices) > self.n_limit:
130
+ kpts_sc = scores[indices]
131
+ sort_idx = kpts_sc.sort(descending=True)[1]
132
+ sel_idx = sort_idx[: self.n_limit]
133
+ indices = indices[sel_idx]
134
+ indices_keypoints.append(indices)
135
+
136
+ keypoints = []
137
+ scoredispersitys = []
138
+ kptscores = []
139
+ if sub_pixel:
140
+ # detect soft keypoints with grad backpropagation
141
+ patches = self.unfold(scores_map) # B x (kernel**2) x (H*W)
142
+ self.hw_grid = self.hw_grid.to(patches) # to device
143
+ for b_idx in range(b):
144
+ patch = patches[b_idx].t() # (H*W) x (kernel**2)
145
+ indices_kpt = indices_keypoints[
146
+ b_idx
147
+ ] # one dimension vector, say its size is M
148
+ patch_scores = patch[indices_kpt] # M x (kernel**2)
149
+
150
+ # max is detached to prevent undesired backprop loops in the graph
151
+ max_v = patch_scores.max(dim=1).values.detach()[:, None]
152
+ x_exp = (
153
+ (patch_scores - max_v) / self.temperature
154
+ ).exp() # M * (kernel**2), in [0, 1]
155
+
156
+ # \frac{ \sum{(i,j) \times \exp(x/T)} }{ \sum{\exp(x/T)} }
157
+ xy_residual = (
158
+ x_exp @ self.hw_grid / x_exp.sum(dim=1)[:, None]
159
+ ) # Soft-argmax, Mx2
160
+
161
+ hw_grid_dist2 = (
162
+ torch.norm(
163
+ (self.hw_grid[None, :, :] - xy_residual[:, None, :])
164
+ / self.radius,
165
+ dim=-1,
166
+ )
167
+ ** 2
168
+ )
169
+ scoredispersity = (x_exp * hw_grid_dist2).sum(dim=1) / x_exp.sum(dim=1)
170
+
171
+ # compute result keypoints
172
+ keypoints_xy_nms = torch.stack(
173
+ [indices_kpt % w, indices_kpt // w], dim=1
174
+ ) # Mx2
175
+ keypoints_xy = keypoints_xy_nms + xy_residual
176
+ keypoints_xy = (
177
+ keypoints_xy / keypoints_xy.new_tensor([w - 1, h - 1]) * 2 - 1
178
+ ) # (w,h) -> (-1~1,-1~1)
179
+
180
+ kptscore = torch.nn.functional.grid_sample(
181
+ scores_map[b_idx].unsqueeze(0),
182
+ keypoints_xy.view(1, 1, -1, 2),
183
+ mode="bilinear",
184
+ align_corners=True,
185
+ )[
186
+ 0, 0, 0, :
187
+ ] # CxN
188
+
189
+ keypoints.append(keypoints_xy)
190
+ scoredispersitys.append(scoredispersity)
191
+ kptscores.append(kptscore)
192
+ else:
193
+ for b_idx in range(b):
194
+ indices_kpt = indices_keypoints[
195
+ b_idx
196
+ ] # one dimension vector, say its size is M
197
+ keypoints_xy_nms = torch.stack(
198
+ [indices_kpt % w, indices_kpt // w], dim=1
199
+ ) # Mx2
200
+ keypoints_xy = (
201
+ keypoints_xy_nms / keypoints_xy_nms.new_tensor([w - 1, h - 1]) * 2
202
+ - 1
203
+ ) # (w,h) -> (-1~1,-1~1)
204
+ kptscore = torch.nn.functional.grid_sample(
205
+ scores_map[b_idx].unsqueeze(0),
206
+ keypoints_xy.view(1, 1, -1, 2),
207
+ mode="bilinear",
208
+ align_corners=True,
209
+ )[
210
+ 0, 0, 0, :
211
+ ] # CxN
212
+ keypoints.append(keypoints_xy)
213
+ scoredispersitys.append(None)
214
+ kptscores.append(kptscore)
215
+
216
+ return keypoints, scoredispersitys, kptscores
217
+
218
+ def forward(self, scores_map, descriptor_map, sub_pixel=False):
219
+ """
220
+ :param scores_map: Bx1xHxW
221
+ :param descriptor_map: BxCxHxW
222
+ :param sub_pixel: whether to use sub-pixel keypoint detection
223
+ :return: kpts: list[Nx2,...]; kptscores: list[N,....] normalised position: -1.0 ~ 1.0
224
+ """
225
+ keypoints, scoredispersitys, kptscores = self.detect_keypoints(
226
+ scores_map, sub_pixel
227
+ )
228
+
229
+ descriptors = sample_descriptor(descriptor_map, keypoints, sub_pixel)
230
+
231
+ # keypoints: B M 2
232
+ # descriptors: B M D
233
+ # scoredispersitys:
234
+ return keypoints, descriptors, kptscores, scoredispersitys
third_party/ASpanFormer/.github/workflows/sync.yml ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Upstream Sync
2
+
3
+ permissions:
4
+ contents: write
5
+
6
+ on:
7
+ schedule:
8
+ - cron: "0 0 * * *" # every day
9
+ workflow_dispatch:
10
+
11
+ jobs:
12
+ sync_latest_from_upstream:
13
+ name: Sync latest commits from upstream repo
14
+ runs-on: ubuntu-latest
15
+ if: ${{ github.event.repository.fork }}
16
+
17
+ steps:
18
+ # Step 1: run a standard checkout action
19
+ - name: Checkout target repo
20
+ uses: actions/checkout@v3
21
+
22
+ # Step 2: run the sync action
23
+ - name: Sync upstream changes
24
+ id: sync
25
+ uses: aormsby/Fork-Sync-With-Upstream-action@v3.4
26
+ with:
27
+ upstream_sync_repo: apple/ml-aspanformer
28
+ upstream_sync_branch: main
29
+ target_sync_branch: main
30
+ target_repo_token: ${{ secrets.GITHUB_TOKEN }} # automatically generated, no need to set
31
+
32
+ # Set test_mode true to run tests instead of the true action!!
33
+ test_mode: false
34
+
35
+ - name: Sync check
36
+ if: failure()
37
+ run: |
38
+ echo "::error::Due to insufficient permissions, synchronization failed (as expected). Please go to the repository homepage and manually perform [Sync fork]."
39
+ exit 1
third_party/ASpanFormer/.gitignore ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .vscode/
2
+ __pycache__/
3
+ *.pyc
4
+ *.DS_Store
5
+ *.swp
6
+ *.pth
7
+ tmp.*
8
+ */.ipynb_checkpoints/*
9
+
10
+ logs/
11
+ # weights/
12
+ dump/
13
+ demo/*.mp4
14
+ demo/demo_images/
15
+ src/loftr/utils/superglue.py
16
+ demo/utils.py
17
+
18
+ demo/*.jpg
19
+ demo/*.png
20
+
21
+ notebooks/QccDayNight.ipynb
22
+ notebooks/westlake.ipynb
23
+ assets/westlake
24
+ assets/qcc_pairs.txt
25
+ configs/.petrel*
26
+ tools/draw_QccDayNights.py
27
+
28
+ scripts/slurm/
29
+ scripts/sbatch_submit.sh
30
+ src/utils/client.py
31
+
32
+ scannet_indices/
third_party/ASpanFormer/CODE_OF_CONDUCT.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, sex characteristics, gender identity and expression,
9
+ level of experience, education, socio-economic status, nationality, personal
10
+ appearance, race, religion, or sexual identity and orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies within all project spaces, and it also applies when
49
+ an individual is representing the project or its community in public spaces.
50
+ Examples of representing a project or community include using an official
51
+ project e-mail address, posting via an official social media account, or acting
52
+ as an appointed representative at an online or offline event. Representation of
53
+ a project may be further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the open source team at [opensource-conduct@group.apple.com](mailto:opensource-conduct@group.apple.com). All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org), version 1.4,
71
+ available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct.html](https://www.contributor-covenant.org/version/1/4/code-of-conduct.html)
third_party/ASpanFormer/CONTRIBUTING.md ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ # Contribution Guide
2
+
3
+ Thanks for your interest in contributing. This project was released to accompany a research paper for purposes of reproducability, and beyond its publication there are limited plans for future development of the repository.
4
+
5
+ ## Before you get started
6
+
7
+ We ask that all community members read and observe our [Code of Conduct](CODE_OF_CONDUCT.md).
third_party/ASpanFormer/LICENSE ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ Copyright (C) 2021, 2022 Apple Inc. All Rights Reserved.
2
+
3
+ IMPORTANT: This Apple software is supplied to you by Apple Inc. ("Apple") in consideration of your agreement to the following terms, and your use, installation, modification or redistribution of this Apple software constitutes acceptance of these terms. If you do not agree with these terms, please do not use, install, modify or redistribute this Apple software.
4
+
5
+ In consideration of your agreement to abide by the following terms, and subject to these terms, Apple grants you a personal, non-commercial, non-exclusive license, under Apple's copyrights in this original Apple software (the "Apple Software"), to use, reproduce, modify and redistribute the Apple Software, with or without modifications, in source and/or binary forms for non-commercial purposes only; provided that if you redistribute the Apple Software in its entirety and without modifications, you must retain this notice and the following text and disclaimers in all such redistributions of the Apple Software. Neither the name, trademarks, service marks or logos of Apple Inc. may be used to endorse or promote products derived from the Apple Software without specific prior written permission from Apple. Except as expressly stated in this notice, no other rights or licenses, express or implied, are granted by Apple herein, including but not limited to any patent rights that may be infringed by your derivative works or by other works in which the Apple Software may be incorporated.
6
+
7
+ The Apple Software is provided by Apple on an "AS IS" basis. APPLE MAKES NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, REGARDING THE APPLE SOFTWARE OR ITS USE AND OPERATION ALONE OR IN COMBINATION WITH YOUR PRODUCTS.
8
+
9
+ IN NO EVENT SHALL APPLE BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) ARISING IN ANY WAY OUT OF THE USE, REPRODUCTION, MODIFICATION AND/OR DISTRIBUTION OF THE APPLE SOFTWARE, HOWEVER CAUSED AND WHETHER UNDER THEORY OF CONTRACT, TORT (INCLUDING NEGLIGENCE), STRICT LIABILITY OR OTHERWISE, EVEN IF APPLE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
third_party/ASpanFormer/README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Submodule used in [hloc](https://github.com/Vincentqyw/Hierarchical-Localization) toolbox
2
+
3
+ # ASpanFormer Implementation
4
+
5
+ ![Framework](assets/teaser.png)
6
+
7
+ This is a PyTorch implementation of ASpanFormer for ECCV'22 [paper](https://arxiv.org/abs/2208.14201), “ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer”, and can be used to reproduce the results in the paper.
8
+
9
+ This work focuses on detector-free image matching. We propose a hierarchical attention framework for cross-view feature update, which adaptively adjusts attention span based on region-wise matchability.
10
+
11
+ This repo contains training, evaluation and basic demo scripts used in our paper.
12
+
13
+ A large part of the code base is borrowed from the [LoFTR Repository](https://github.com/zju3dv/LoFTR) under its own separate license, terms and conditions. The authors of this software are not responsible for the contents of third-party websites.
14
+
15
+ ## Installation
16
+ ```bash
17
+ conda env create -f environment.yaml
18
+ conda activate ASpanFormer
19
+ ```
20
+
21
+ ## Get started
22
+ Download model weights from [here](https://drive.google.com/file/d/1eavM9dTkw9nbc-JqlVVfGPU5UvTTfc6k/view?usp=share_link)
23
+
24
+ Extract weights by
25
+ ```bash
26
+ tar -xvf weights_aspanformer.tar
27
+ ```
28
+
29
+ A demo to match one image pair is provided. To get a quick start,
30
+
31
+ ```bash
32
+ cd demo
33
+ python demo.py
34
+ ```
35
+
36
+
37
+ ## Data Preparation
38
+ Please follow the [training doc](docs/TRAINING.md) for data organization
39
+
40
+
41
+
42
+ ## Evaluation
43
+
44
+
45
+ ### 1. ScanNet Evaluation
46
+ ```bash
47
+ cd scripts/reproduce_test
48
+ bash indoor.sh
49
+ ```
50
+ Similar results as below should be obtained,
51
+ ```bash
52
+ 'auc@10': 0.46640095171012563,
53
+ 'auc@20': 0.6407042320049785,
54
+ 'auc@5': 0.26241231577189295,
55
+ 'prec@5e-04': 0.8827665604024288,
56
+ 'prec_flow@2e-03': 0.810938751342228
57
+ ```
58
+
59
+ ### 2. MegaDepth Evaluation
60
+ ```bash
61
+ cd scripts/reproduce_test
62
+ bash outdoor.sh
63
+ ```
64
+ Similar results as below should be obtained,
65
+ ```bash
66
+ 'auc@10': 0.7184113573584142,
67
+ 'auc@20': 0.8333835724453831,
68
+ 'auc@5': 0.5567622479156181,
69
+ 'prec@5e-04': 0.9901741341790503,
70
+ 'prec_flow@2e-03': 0.7188964321862907
71
+ ```
72
+
73
+
74
+ ## Training
75
+
76
+ ### 1. ScanNet Training
77
+ ```bash
78
+ cd scripts/reproduce_train
79
+ bash indoor.sh
80
+ ```
81
+
82
+ ### 2. MegaDepth Training
83
+ ```bash
84
+ cd scripts/reproduce_train
85
+ bash outdoor.sh
86
+ ```
87
+
88
+
89
+ If you find this project useful, please cite:
90
+
91
+ ```
92
+ @article{chen2022aspanformer,
93
+ title={ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer},
94
+ author={Chen, Hongkai and Luo, Zixin and Zhou, Lei and Tian, Yurun and Zhen, Mingmin and Fang, Tian and McKinnon, David and Tsin, Yanghai and Quan, Long},
95
+ journal={European Conference on Computer Vision (ECCV)},
96
+ year={2022}
97
+ }
98
+ ```
third_party/ASpanFormer/configs/aspan/indoor/aspan_test.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ from pathlib import Path
3
+
4
+ sys.path.append(str(Path(__file__).parent / "../../../"))
5
+ from src.config.default import _CN as cfg
6
+
7
+ cfg.ASPAN.MATCH_COARSE.MATCH_TYPE = "dual_softmax"
8
+
9
+ cfg.ASPAN.MATCH_COARSE.BORDER_RM = 0
10
+ cfg.ASPAN.COARSE.COARSEST_LEVEL = [15, 20]
11
+ cfg.ASPAN.COARSE.TRAIN_RES = [480, 640]
third_party/ASpanFormer/configs/aspan/indoor/aspan_train.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ from pathlib import Path
3
+
4
+ sys.path.append(str(Path(__file__).parent / "../../../"))
5
+ from src.config.default import _CN as cfg
6
+
7
+ cfg.ASPAN.COARSE.COARSEST_LEVEL = [15, 20]
8
+ cfg.ASPAN.MATCH_COARSE.MATCH_TYPE = "dual_softmax"
9
+
10
+ cfg.ASPAN.MATCH_COARSE.SPARSE_SPVS = False
11
+ cfg.ASPAN.MATCH_COARSE.BORDER_RM = 0
12
+ cfg.TRAINER.MSLR_MILESTONES = [3, 6, 9, 12, 17, 20, 23, 26, 29]
third_party/ASpanFormer/configs/aspan/outdoor/aspan_test.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ from pathlib import Path
3
+
4
+ sys.path.append(str(Path(__file__).parent / "../../../"))
5
+ from src.config.default import _CN as cfg
6
+
7
+ cfg.ASPAN.COARSE.COARSEST_LEVEL = [36, 36]
8
+ cfg.ASPAN.COARSE.TRAIN_RES = [832, 832]
9
+ cfg.ASPAN.COARSE.TEST_RES = [1152, 1152]
10
+ cfg.ASPAN.MATCH_COARSE.MATCH_TYPE = "dual_softmax"
11
+
12
+ cfg.TRAINER.CANONICAL_LR = 8e-3
13
+ cfg.TRAINER.WARMUP_STEP = 1875 # 3 epochs
14
+ cfg.TRAINER.WARMUP_RATIO = 0.1
15
+ cfg.TRAINER.MSLR_MILESTONES = [8, 12, 16, 20, 24]
16
+
17
+ # pose estimation
18
+ cfg.TRAINER.RANSAC_PIXEL_THR = 0.5
19
+
20
+ cfg.TRAINER.OPTIMIZER = "adamw"
21
+ cfg.TRAINER.ADAMW_DECAY = 0.1
22
+ cfg.ASPAN.MATCH_COARSE.TRAIN_COARSE_PERCENT = 0.3
third_party/ASpanFormer/configs/aspan/outdoor/aspan_train.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ from pathlib import Path
3
+
4
+ sys.path.append(str(Path(__file__).parent / "../../../"))
5
+ from src.config.default import _CN as cfg
6
+
7
+ cfg.ASPAN.COARSE.COARSEST_LEVEL = [26, 26]
8
+ cfg.ASPAN.MATCH_COARSE.MATCH_TYPE = "dual_softmax"
9
+ cfg.ASPAN.MATCH_COARSE.SPARSE_SPVS = False
10
+
11
+ cfg.TRAINER.CANONICAL_LR = 8e-3
12
+ cfg.TRAINER.WARMUP_STEP = 1875 # 3 epochs
13
+ cfg.TRAINER.WARMUP_RATIO = 0.1
14
+ cfg.TRAINER.MSLR_MILESTONES = [8, 12, 16, 20, 24]
15
+
16
+ # pose estimation
17
+ cfg.TRAINER.RANSAC_PIXEL_THR = 0.5
18
+
19
+ cfg.TRAINER.OPTIMIZER = "adamw"
20
+ cfg.TRAINER.ADAMW_DECAY = 0.1
21
+ cfg.ASPAN.MATCH_COARSE.TRAIN_COARSE_PERCENT = 0.3
third_party/ASpanFormer/configs/data/__init__.py ADDED
File without changes
third_party/ASpanFormer/configs/data/base.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ The data config will be the last one merged into the main config.
3
+ Setups in data configs will override all existed setups!
4
+ """
5
+
6
+ from yacs.config import CfgNode as CN
7
+
8
+ _CN = CN()
9
+ _CN.DATASET = CN()
10
+ _CN.TRAINER = CN()
11
+
12
+ # training data config
13
+ _CN.DATASET.TRAIN_DATA_ROOT = None
14
+ _CN.DATASET.TRAIN_POSE_ROOT = None
15
+ _CN.DATASET.TRAIN_NPZ_ROOT = None
16
+ _CN.DATASET.TRAIN_LIST_PATH = None
17
+ _CN.DATASET.TRAIN_INTRINSIC_PATH = None
18
+ # validation set config
19
+ _CN.DATASET.VAL_DATA_ROOT = None
20
+ _CN.DATASET.VAL_POSE_ROOT = None
21
+ _CN.DATASET.VAL_NPZ_ROOT = None
22
+ _CN.DATASET.VAL_LIST_PATH = None
23
+ _CN.DATASET.VAL_INTRINSIC_PATH = None
24
+
25
+ # testing data config
26
+ _CN.DATASET.TEST_DATA_ROOT = None
27
+ _CN.DATASET.TEST_POSE_ROOT = None
28
+ _CN.DATASET.TEST_NPZ_ROOT = None
29
+ _CN.DATASET.TEST_LIST_PATH = None
30
+ _CN.DATASET.TEST_INTRINSIC_PATH = None
31
+
32
+ # dataset config
33
+ _CN.DATASET.MIN_OVERLAP_SCORE_TRAIN = 0.4
34
+ _CN.DATASET.MIN_OVERLAP_SCORE_TEST = 0.0 # for both test and val
35
+
36
+ cfg = _CN
third_party/ASpanFormer/configs/data/debug/.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ *
2
+ */
3
+ !.gitignore
third_party/ASpanFormer/configs/data/megadepth_test_1500.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from configs.data.base import cfg
2
+
3
+ TEST_BASE_PATH = "assets/megadepth_test_1500_scene_info"
4
+
5
+ cfg.DATASET.TEST_DATA_SOURCE = "MegaDepth"
6
+ cfg.DATASET.TEST_DATA_ROOT = "data/megadepth/test"
7
+ cfg.DATASET.TEST_NPZ_ROOT = f"{TEST_BASE_PATH}"
8
+ cfg.DATASET.TEST_LIST_PATH = f"{TEST_BASE_PATH}/megadepth_test_1500.txt"
9
+
10
+ cfg.DATASET.MGDPT_IMG_RESIZE = 1152
11
+ cfg.DATASET.MGDPT_IMG_PAD = True
12
+ cfg.DATASET.MGDPT_DF = 8
13
+ cfg.DATASET.MIN_OVERLAP_SCORE_TEST = 0.0
third_party/ASpanFormer/configs/data/megadepth_trainval_832.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from configs.data.base import cfg
2
+
3
+
4
+ TRAIN_BASE_PATH = "data/megadepth/index"
5
+ cfg.DATASET.TRAINVAL_DATA_SOURCE = "MegaDepth"
6
+ cfg.DATASET.TRAIN_DATA_ROOT = "data/megadepth/train"
7
+ cfg.DATASET.TRAIN_NPZ_ROOT = f"{TRAIN_BASE_PATH}/scene_info_0.1_0.7"
8
+ cfg.DATASET.TRAIN_LIST_PATH = f"{TRAIN_BASE_PATH}/trainvaltest_list/train_list.txt"
9
+ cfg.DATASET.MIN_OVERLAP_SCORE_TRAIN = 0.0
10
+
11
+ TEST_BASE_PATH = "data/megadepth/index"
12
+ cfg.DATASET.TEST_DATA_SOURCE = "MegaDepth"
13
+ cfg.DATASET.VAL_DATA_ROOT = cfg.DATASET.TEST_DATA_ROOT = "data/megadepth/test"
14
+ cfg.DATASET.VAL_NPZ_ROOT = (
15
+ cfg.DATASET.TEST_NPZ_ROOT
16
+ ) = f"{TEST_BASE_PATH}/scene_info_val_1500"
17
+ cfg.DATASET.VAL_LIST_PATH = (
18
+ cfg.DATASET.TEST_LIST_PATH
19
+ ) = f"{TEST_BASE_PATH}/trainvaltest_list/val_list.txt"
20
+ cfg.DATASET.MIN_OVERLAP_SCORE_TEST = 0.0 # for both test and val
21
+
22
+ # 368 scenes in total for MegaDepth
23
+ # (with difficulty balanced (further split each scene to 3 sub-scenes))
24
+ cfg.TRAINER.N_SAMPLES_PER_SUBSET = 100
25
+
26
+ cfg.DATASET.MGDPT_IMG_RESIZE = 832 # for training on 32GB meme GPUs
third_party/ASpanFormer/configs/data/scannet_test_1500.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from configs.data.base import cfg
2
+
3
+ TEST_BASE_PATH = "assets/scannet_test_1500"
4
+
5
+ cfg.DATASET.TEST_DATA_SOURCE = "ScanNet"
6
+ cfg.DATASET.TEST_DATA_ROOT = "data/scannet/test"
7
+ cfg.DATASET.TEST_NPZ_ROOT = f"{TEST_BASE_PATH}"
8
+ cfg.DATASET.TEST_LIST_PATH = f"{TEST_BASE_PATH}/scannet_test.txt"
9
+ cfg.DATASET.TEST_INTRINSIC_PATH = f"{TEST_BASE_PATH}/intrinsics.npz"
10
+
11
+ cfg.DATASET.MIN_OVERLAP_SCORE_TEST = 0.0
third_party/ASpanFormer/configs/data/scannet_trainval.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from configs.data.base import cfg
2
+
3
+
4
+ TRAIN_BASE_PATH = "data/scannet/index"
5
+ cfg.DATASET.TRAINVAL_DATA_SOURCE = "ScanNet"
6
+ cfg.DATASET.TRAIN_DATA_ROOT = "data/scannet/train"
7
+ cfg.DATASET.TRAIN_NPZ_ROOT = f"{TRAIN_BASE_PATH}/scene_data/train"
8
+ cfg.DATASET.TRAIN_LIST_PATH = f"{TRAIN_BASE_PATH}/scene_data/train_list/scannet_all.txt"
9
+ cfg.DATASET.TRAIN_INTRINSIC_PATH = f"{TRAIN_BASE_PATH}/intrinsics.npz"
10
+
11
+ TEST_BASE_PATH = "assets/scannet_test_1500"
12
+ cfg.DATASET.TEST_DATA_SOURCE = "ScanNet"
13
+ cfg.DATASET.VAL_DATA_ROOT = cfg.DATASET.TEST_DATA_ROOT = "data/scannet/test"
14
+ cfg.DATASET.VAL_NPZ_ROOT = cfg.DATASET.TEST_NPZ_ROOT = TEST_BASE_PATH
15
+ cfg.DATASET.VAL_LIST_PATH = (
16
+ cfg.DATASET.TEST_LIST_PATH
17
+ ) = f"{TEST_BASE_PATH}/scannet_test.txt"
18
+ cfg.DATASET.VAL_INTRINSIC_PATH = (
19
+ cfg.DATASET.TEST_INTRINSIC_PATH
20
+ ) = f"{TEST_BASE_PATH}/intrinsics.npz"
21
+ cfg.DATASET.MIN_OVERLAP_SCORE_TEST = 0.0 # for both test and val
third_party/ASpanFormer/data/megadepth/index/.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Ignore everything in this directory
2
+ *
3
+ # Except this file
4
+ !.gitignore
third_party/ASpanFormer/data/megadepth/test/.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Ignore everything in this directory
2
+ *
3
+ # Except this file
4
+ !.gitignore
third_party/ASpanFormer/data/megadepth/train/.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Ignore everything in this directory
2
+ *
3
+ # Except this file
4
+ !.gitignore
third_party/ASpanFormer/data/scannet/index/.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Ignore everything in this directory
2
+ *
3
+ # Except this file
4
+ !.gitignore
third_party/ASpanFormer/data/scannet/test/.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ *
2
+ */
3
+ !.gitignore
third_party/ASpanFormer/data/scannet/train/.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Ignore everything in this directory
2
+ *
3
+ # Except this file
4
+ !.gitignore
third_party/ASpanFormer/demo/demo.py ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+
4
+ ROOT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
5
+ sys.path.insert(0, ROOT_DIR)
6
+
7
+ from src.ASpanFormer.aspanformer import ASpanFormer
8
+ from src.config.default import get_cfg_defaults
9
+ from src.utils.misc import lower_config
10
+ import demo_utils
11
+
12
+ import cv2
13
+ import torch
14
+ import numpy as np
15
+
16
+ import argparse
17
+
18
+ parser = argparse.ArgumentParser()
19
+ parser.add_argument(
20
+ "--config_path",
21
+ type=str,
22
+ default="../configs/aspan/outdoor/aspan_test.py",
23
+ help="path for config file.",
24
+ )
25
+ parser.add_argument(
26
+ "--img0_path",
27
+ type=str,
28
+ default="../assets/phototourism_sample_images/piazza_san_marco_06795901_3725050516.jpg",
29
+ help="path for image0.",
30
+ )
31
+ parser.add_argument(
32
+ "--img1_path",
33
+ type=str,
34
+ default="../assets/phototourism_sample_images/piazza_san_marco_15148634_5228701572.jpg",
35
+ help="path for image1.",
36
+ )
37
+ parser.add_argument(
38
+ "--weights_path",
39
+ type=str,
40
+ default="../weights/outdoor.ckpt",
41
+ help="path for model weights.",
42
+ )
43
+ parser.add_argument(
44
+ "--long_dim0", type=int, default=1024, help="resize for longest dim of image0."
45
+ )
46
+ parser.add_argument(
47
+ "--long_dim1", type=int, default=1024, help="resize for longest dim of image1."
48
+ )
49
+
50
+ args = parser.parse_args()
51
+
52
+
53
+ if __name__ == "__main__":
54
+ config = get_cfg_defaults()
55
+ config.merge_from_file(args.config_path)
56
+ _config = lower_config(config)
57
+ matcher = ASpanFormer(config=_config["aspan"])
58
+ state_dict = torch.load(args.weights_path, map_location="cpu")["state_dict"]
59
+ matcher.load_state_dict(state_dict, strict=False)
60
+ matcher.cuda(), matcher.eval()
61
+
62
+ img0, img1 = cv2.imread(args.img0_path), cv2.imread(args.img1_path)
63
+ img0_g, img1_g = cv2.imread(args.img0_path, 0), cv2.imread(args.img1_path, 0)
64
+ img0, img1 = demo_utils.resize(img0, args.long_dim0), demo_utils.resize(
65
+ img1, args.long_dim1
66
+ )
67
+ img0_g, img1_g = demo_utils.resize(img0_g, args.long_dim0), demo_utils.resize(
68
+ img1_g, args.long_dim1
69
+ )
70
+ data = {
71
+ "image0": torch.from_numpy(img0_g / 255.0)[None, None].cuda().float(),
72
+ "image1": torch.from_numpy(img1_g / 255.0)[None, None].cuda().float(),
73
+ }
74
+ with torch.no_grad():
75
+ matcher(data, online_resize=True)
76
+ corr0, corr1 = data["mkpts0_f"].cpu().numpy(), data["mkpts1_f"].cpu().numpy()
77
+
78
+ F_hat, mask_F = cv2.findFundamentalMat(
79
+ corr0, corr1, method=cv2.FM_RANSAC, ransacReprojThreshold=1
80
+ )
81
+ if mask_F is not None:
82
+ mask_F = mask_F[:, 0].astype(bool)
83
+ else:
84
+ mask_F = np.zeros_like(corr0[:, 0]).astype(bool)
85
+
86
+ # visualize match
87
+ display = demo_utils.draw_match(img0, img1, corr0, corr1)
88
+ display_ransac = demo_utils.draw_match(img0, img1, corr0[mask_F], corr1[mask_F])
89
+ cv2.imwrite("match.png", display)
90
+ cv2.imwrite("match_ransac.png", display_ransac)
91
+ print(len(corr1), len(corr1[mask_F]))
third_party/ASpanFormer/demo/demo_utils.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import numpy as np
3
+
4
+
5
+ def resize(image, long_dim):
6
+ h, w = image.shape[0], image.shape[1]
7
+ image = cv2.resize(
8
+ image, (int(w * long_dim / max(h, w)), int(h * long_dim / max(h, w)))
9
+ )
10
+ return image
11
+
12
+
13
+ def draw_points(img, points, color=(0, 255, 0), radius=3):
14
+ dp = [(int(points[i, 0]), int(points[i, 1])) for i in range(points.shape[0])]
15
+ for i in range(points.shape[0]):
16
+ cv2.circle(img, dp[i], radius=radius, color=color)
17
+ return img
18
+
19
+
20
+ def draw_match(
21
+ img1,
22
+ img2,
23
+ corr1,
24
+ corr2,
25
+ inlier=[True],
26
+ color=None,
27
+ radius1=1,
28
+ radius2=1,
29
+ resize=None,
30
+ ):
31
+ if resize is not None:
32
+ scale1, scale2 = [img1.shape[1] / resize[0], img1.shape[0] / resize[1]], [
33
+ img2.shape[1] / resize[0],
34
+ img2.shape[0] / resize[1],
35
+ ]
36
+ img1, img2 = cv2.resize(img1, resize, interpolation=cv2.INTER_AREA), cv2.resize(
37
+ img2, resize, interpolation=cv2.INTER_AREA
38
+ )
39
+ corr1, corr2 = (
40
+ corr1 / np.asarray(scale1)[np.newaxis],
41
+ corr2 / np.asarray(scale2)[np.newaxis],
42
+ )
43
+ corr1_key = [
44
+ cv2.KeyPoint(corr1[i, 0], corr1[i, 1], radius1) for i in range(corr1.shape[0])
45
+ ]
46
+ corr2_key = [
47
+ cv2.KeyPoint(corr2[i, 0], corr2[i, 1], radius2) for i in range(corr2.shape[0])
48
+ ]
49
+
50
+ assert len(corr1) == len(corr2)
51
+
52
+ draw_matches = [cv2.DMatch(i, i, 0) for i in range(len(corr1))]
53
+ if color is None:
54
+ color = [(0, 255, 0) if cur_inlier else (0, 0, 255) for cur_inlier in inlier]
55
+ if len(color) == 1:
56
+ display = cv2.drawMatches(
57
+ img1,
58
+ corr1_key,
59
+ img2,
60
+ corr2_key,
61
+ draw_matches,
62
+ None,
63
+ matchColor=color[0],
64
+ singlePointColor=color[0],
65
+ flags=4,
66
+ )
67
+ else:
68
+ height, width = max(img1.shape[0], img2.shape[0]), img1.shape[1] + img2.shape[1]
69
+ display = np.zeros([height, width, 3], np.uint8)
70
+ display[: img1.shape[0], : img1.shape[1]] = img1
71
+ display[: img2.shape[0], img1.shape[1] :] = img2
72
+ for i in range(len(corr1)):
73
+ left_x, left_y, right_x, right_y = (
74
+ int(corr1[i][0]),
75
+ int(corr1[i][1]),
76
+ int(corr2[i][0] + img1.shape[1]),
77
+ int(corr2[i][1]),
78
+ )
79
+ cur_color = (int(color[i][0]), int(color[i][1]), int(color[i][2]))
80
+ cv2.line(
81
+ display,
82
+ (left_x, left_y),
83
+ (right_x, right_y),
84
+ cur_color,
85
+ 1,
86
+ lineType=cv2.LINE_AA,
87
+ )
88
+ return display
third_party/ASpanFormer/docs/TRAINING.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Traininig ASpanFormer
3
+
4
+ ## Dataset setup
5
+ Generally, two parts of data are needed for training ASpanFormer, the original dataset, i.e., ScanNet and MegaDepth, and the offline generated dataset indices. The dataset indices store scenes, image pairs, and other metadata within each dataset used for training/validation/testing. For the MegaDepth dataset, the relative poses between images used for training are directly cached in the indexing files. However, the relative poses of ScanNet image pairs are not stored due to the enormous resulting file size.
6
+
7
+ ### Download datasets
8
+ #### MegaDepth
9
+ We use depth maps provided in the [original MegaDepth dataset](https://www.cs.cornell.edu/projects/megadepth/) as well as undistorted images, corresponding camera intrinsics and extrinsics preprocessed by [D2-Net](https://github.com/mihaidusmanu/d2-net#downloading-and-preprocessing-the-megadepth-dataset). You can download them separately from the following links.
10
+ - [MegaDepth undistorted images and processed depths](https://www.cs.cornell.edu/projects/megadepth/dataset/Megadepth_v1/MegaDepth_v1.tar.gz)
11
+ - Note that we only use depth maps.
12
+ - Path of the download data will be referreed to as `/path/to/megadepth`
13
+ - [D2-Net preprocessed images](https://drive.google.com/drive/folders/1hxpOsqOZefdrba_BqnW490XpNX_LgXPB)
14
+ - Images are undistorted manually in D2-Net since the undistorted images from MegaDepth do not come with corresponding intrinsics.
15
+ - Path of the download data will be referreed to as `/path/to/megadepth_d2net`
16
+
17
+ #### ScanNet
18
+ Please set up the ScanNet dataset following [the official guide](https://github.com/ScanNet/ScanNet#scannet-data)
19
+ > NOTE: We use the [python exported data](https://github.com/ScanNet/ScanNet/tree/master/SensReader/python),
20
+ instead of the [c++ exported one](https://github.com/ScanNet/ScanNet/tree/master/SensReader/c%2B%2B).
21
+
22
+ ### Download the dataset indices
23
+
24
+ You can download the required dataset indices from the [following link](https://drive.google.com/drive/folders/1DOcOPZb3-5cWxLqn256AhwUVjBPifhuf).
25
+ After downloading, unzip the required files.
26
+ ```shell
27
+ unzip downloaded-file.zip
28
+
29
+ # extract dataset indices
30
+ tar xf train-data/megadepth_indices.tar
31
+ tar xf train-data/scannet_indices.tar
32
+
33
+ # extract testing data (optional)
34
+ tar xf testdata/megadepth_test_1500.tar
35
+ tar xf testdata/scannet_test_1500.tar
36
+ ```
37
+
38
+ ### Build the dataset symlinks
39
+
40
+ We symlink the datasets to the `data` directory under the main ASpanFormer project directory.
41
+
42
+ ```shell
43
+ # scannet
44
+ # -- # train and test dataset
45
+ ln -s /path/to/scannet_train/* /path/to/ASpanFormer/data/scannet/train
46
+ ln -s /path/to/scannet_test/* /path/to/ASpanFormer/data/scannet/test
47
+ # -- # dataset indices
48
+ ln -s /path/to/scannet_indices/* /path/to/ASpanFormer/data/scannet/index
49
+
50
+ # megadepth
51
+ # -- # train and test dataset (train and test share the same dataset)
52
+ ln -sv /path/to/megadepth/phoenix /path/to/megadepth_d2net/Undistorted_SfM /path/to/ASpanFormer/data/megadepth/train
53
+ ln -sv /path/to/megadepth/phoenix /path/to/megadepth_d2net/Undistorted_SfM /path/to/ASpanFormer/data/megadepth/test
54
+ # -- # dataset indices
55
+ ln -s /path/to/megadepth_indices/* /path/to/ASpanFormer/data/megadepth/index
56
+ ```
57
+
58
+
59
+ ## Training
60
+ We provide training scripts of ScanNet and MegaDepth. The results in the ASpanFormer paper can be reproduced with 8 v100 GPUs. For a different setup, we scale the learning rate and its warm-up linearly, but the final evaluation results might vary due to the different batch size & learning rate used. Thus the reproduction of results in our paper is not guaranteed.
61
+
62
+
63
+ ### Training on ScanNet
64
+ ``` shell
65
+ scripts/reproduce_train/indoor.sh
66
+ ```
67
+
68
+
69
+ ### Training on MegaDepth
70
+ ``` shell
71
+ scripts/reproduce_train/outdoor.sh
72
+ ```
third_party/ASpanFormer/environment.yaml ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: ASpanFormer
2
+ channels:
3
+ - pytorch
4
+ - conda-forge
5
+ - defaults
6
+ dependencies:
7
+ - python=3.8
8
+ - cudatoolkit=10.2
9
+ - pytorch=1.8.1
10
+ - pip
11
+ - pip:
12
+ - -r requirements.txt
third_party/ASpanFormer/requirements.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #opencv_python==4.4.0.46
2
+ albumentations==0.5.1 --no-binary=imgaug,albumentations
3
+ ray>=1.0.1
4
+ einops==0.3.0
5
+ kornia==0.4.1
6
+ loguru==0.5.3
7
+ yacs>=0.1.8
8
+ tqdm
9
+ autopep8
10
+ pylint
11
+ ipython
12
+ jupyterlab
13
+ matplotlib
14
+ h5py
15
+ pytorch-lightning==1.3.5
16
+ loguru
17
+ joblib>=1.0.1
18
+ torchmetrics==0.4
third_party/ASpanFormer/scripts/reproduce_test/indoor.sh ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash -l
2
+ # a indoor_ds model with the pos_enc impl bug fixed.
3
+
4
+ SCRIPTPATH=$(dirname $(readlink -f "$0"))
5
+ PROJECT_DIR="${SCRIPTPATH}/../../"
6
+
7
+ # conda activate loftr
8
+ export PYTHONPATH=$PROJECT_DIR:$PYTHONPATH
9
+ cd $PROJECT_DIR
10
+
11
+ data_cfg_path="configs/data/scannet_test_1500.py"
12
+ main_cfg_path="configs/aspan/indoor/aspan_test.py"
13
+ ckpt_path='weights/indoor.ckpt'
14
+ dump_dir="dump/indoor_dump"
15
+ profiler_name="inference"
16
+ n_nodes=1 # mannually keep this the same with --nodes
17
+ n_gpus_per_node=-1
18
+ torch_num_workers=4
19
+ batch_size=1 # per gpu
20
+
21
+ python -u ./test.py \
22
+ ${data_cfg_path} \
23
+ ${main_cfg_path} \
24
+ --ckpt_path=${ckpt_path} \
25
+ --dump_dir=${dump_dir} \
26
+ --gpus=${n_gpus_per_node} --num_nodes=${n_nodes} --accelerator="ddp" \
27
+ --batch_size=${batch_size} --num_workers=${torch_num_workers}\
28
+ --profiler_name=${profiler_name} \
29
+ --benchmark \
30
+ --mode integrated
31
+