Spaces:

weizhongyuan123
/

ALIKE

Runtime error

App Files Files Community

Shiaoming commited on Dec 7, 2021

Commit

64abe77

1 Parent(s): da0591b

release code

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +111 -2
alike.py +143 -0
alnet.py +164 -0
assets/alike.png +0 -0
assets/kitti/000100.png +0 -0
assets/kitti/000101.png +0 -0
assets/kitti/000102.png +0 -0
assets/kitti/000103.png +0 -0
assets/kitti/000104.png +0 -0
assets/kitti/000105.png +0 -0
assets/kitti/000106.png +0 -0
assets/kitti/000107.png +0 -0
assets/kitti/000108.png +0 -0
assets/kitti/000109.png +0 -0
assets/kitti/000110.png +0 -0
assets/kitti/000111.png +0 -0
assets/kitti/000112.png +0 -0
assets/kitti/000113.png +0 -0
assets/kitti/000114.png +0 -0
assets/kitti/000115.png +0 -0
assets/kitti/000116.png +0 -0
assets/kitti/000117.png +0 -0
assets/kitti/000118.png +0 -0
assets/kitti/000119.png +0 -0
assets/tum/1311868169.163498.png +0 -0
assets/tum/1311868169.263274.png +0 -0
assets/tum/1311868169.363470.png +0 -0
assets/tum/1311868169.463229.png +0 -0
assets/tum/1311868169.563501.png +0 -0
assets/tum/1311868169.663240.png +0 -0
assets/tum/1311868169.763417.png +0 -0
assets/tum/1311868169.863396.png +0 -0
assets/tum/1311868169.963415.png +0 -0
assets/tum/1311868170.063469.png +0 -0
assets/tum/1311868170.163416.png +0 -0
assets/tum/1311868170.263521.png +0 -0
assets/tum/1311868170.363400.png +0 -0
assets/tum/1311868170.463383.png +0 -0
assets/tum/1311868170.563345.png +0 -0
assets/tum/1311868170.663430.png +0 -0
assets/tum/1311868170.763453.png +0 -0
assets/tum/1311868170.863446.png +0 -0
assets/tum/1311868170.963440.png +0 -0
assets/tum/1311868171.063438.png +0 -0
demo.py +167 -0
hseq/cache/alike-l-ms.npy +0 -0
hseq/cache/alike-l.npy +0 -0
hseq/cache/alike-n-ms.npy +0 -0
hseq/cache/alike-n.npy +0 -0
hseq/cache/aslfeat.npy +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,112 @@
-# ALIKE
-The code will be released after the paper has been accepted.

+# ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction
+ALIKE applies a differentiable keypoint detection module to detect accurate sub-pixel keypoints. The network can run at 95 frames per second for 640 x 480 images on NVIDIA Titan RTX GPU and achieve equivalent performance with the state-of-the-arts. ALIKE benefits real-time applications in resource-limited platforms/devices. Technical details are described in [this paper](https://arxiv.org/pdf/2112.02906.pdf).
+> ```
+> Xiaoming Zhao, Xingming Wu, Jinyu Miao, Weihai Chen, Peter C. Y. Chen, Zhengguo Li, "ALIKE: Accurate and Lightweight Keypoint
+> Detection and Descriptor Extraction," IEEE Transactions on Multimedia, 2022.
+> ```
+![](./assets/alike.png)
+If you use ALIKE in an academic work, please cite:
+```
+@article{Zhao2022ALIKE,
+      title={ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction},
+      author={Xiaoming Zhao and Xingming Wu and Jinyu Miao and Weihai Chen and Peter C. Y. Chen and Zhengguo Li},
+      journal={IEEE Transactions on Multimedia},
+      year={2022}
+}
+```
+## 1. Prerequisites
+The required packages are listed in the `requirements.txt` :
+```shell
+pip install -r requirements.txt
+```
+## 2. Models
+The off-the-shelf weights of four variant ALIKE models are provided in `models/` .
+## 3. Run demo
+```shell
+$ python demo.py -h
+usage: demo.py [-h] [--model {alike-t,alike-s,alike-n,alike-l}]
+               [--device DEVICE] [--top_k TOP_K] [--scores_th SCORES_TH]
+               [--n_limit N_LIMIT] [--no_display] [--no_sub_pixel]
+               input
+ALike Demo.
+positional arguments:
+  input                 Image directory or movie file or "camera0" (for
+                        webcam0).
+optional arguments:
+  -h, --help            show this help message and exit
+  --model {alike-t,alike-s,alike-n,alike-l}
+                        The model configuration
+  --device DEVICE       Running device (default: cuda).
+  --top_k TOP_K         Detect top K keypoints. -1 for threshold based mode,
+                        >0 for top K mode. (default: -1)
+  --scores_th SCORES_TH
+                        Detector score threshold (default: 0.2).
+  --n_limit N_LIMIT     Maximum number of keypoints to be detected (default:
+                        5000).
+  --no_display          Do not display images to screen. Useful if running
+                        remotely (default: False).
+  --no_sub_pixel        Do not detect sub-pixel keypoints (default: False).
+```
+## 4. Examples
+### KITTI example
+```shell
+python demo.py assets/kitti
+```
+![](./assets/kitti.gif)
+### TUM example
+```shell
+python demo.py assets/tum
+```
+![](./assets/tum.gif)
+## 5. Efficiency and performance
+| Models | Parameters | GFLOPs(640x480) | MHA@3 on Hpatches | mAA(10°) on [IMW2020-test](https://www.cs.ubc.ca/research/image-matching-challenge/2021/leaderboard) (Stereo) |
+|:---:|:---:|:---:|:-----------------:|:-------------------------------------------------------------------------------------------------------------:|
+| D2-Net(MS) | 7653KB | 889.40 |      38.33%       |                                                    12.27%                                                     |
+| LF-Net(MS) | 2642KB | 24.37 |      57.78%       |                                                    23.44%                                                     |
+| SuperPoint | 1301KB | 26.11 |      70.19%       |                                                    28.97%                                                     |
+| R2D2(MS) | 484KB | 464.55 |      71.48%       |                                                    39.02%                                                     |
+| ASLFeat(MS) | 823KB | 77.58 |      73.52%       |                                                    33.65%                                                     |
+| DISK | 1092KB | 98.97 |      70.56%       |                                                    51.22%                                                     |
+| ALike-N | 318KB | 7.909 |      75.74%       |                                                    47.18%                                                     |
+| ALike-L | 653KB | 19.685 |      76.85%       |                                                    49.58%                                                     |
+### Evaluation on Hpatches
+- Download [hpatches-sequences-release](https://hpatches.github.io/) and put it into `hseq/hpatches-sequences-release`.
+- Remove the unrelaiable sequences as D2-Net.
+- Run the following command to evaluate the performance:
+  ```shell
+  python hseq/eval.py
+  ```
+For more details, please refer to the [paper](https://arxiv.org/abs/2112.02906).

alike.py ADDED Viewed

	@@ -0,0 +1,143 @@

+import logging
+import os
+import cv2
+import torch
+from copy import deepcopy
+import torch.nn.functional as F
+from torchvision.transforms import ToTensor
+import math
+from alnet import ALNet
+from soft_detect import DKD
+import time
+configs = {
+    'alike-t': {'c1': 8, 'c2': 16, 'c3': 32, 'c4': 64, 'dim': 64, 'single_head': True, 'radius': 2,
+                'model_path': os.path.join(os.path.split(__file__)[0], 'models', 'alike-t.pth')},
+    'alike-s': {'c1': 8, 'c2': 16, 'c3': 48, 'c4': 96, 'dim': 96, 'single_head': True, 'radius': 2,
+                'model_path': os.path.join(os.path.split(__file__)[0], 'models', 'alike-s.pth')},
+    'alike-n': {'c1': 16, 'c2': 32, 'c3': 64, 'c4': 128, 'dim': 128, 'single_head': True, 'radius': 2,
+                'model_path': os.path.join(os.path.split(__file__)[0], 'models', 'alike-n.pth')},
+    'alike-l': {'c1': 32, 'c2': 64, 'c3': 128, 'c4': 128, 'dim': 128, 'single_head': False, 'radius': 2,
+                'model_path': os.path.join(os.path.split(__file__)[0], 'models', 'alike-l.pth')},
+}
+class ALike(ALNet):
+    def __init__(self,
+                 # ================================== feature encoder
+                 c1: int = 32, c2: int = 64, c3: int = 128, c4: int = 128, dim: int = 128,
+                 single_head: bool = False,
+                 # ================================== detect parameters
+                 radius: int = 2,
+                 top_k: int = 500, scores_th: float = 0.5,
+                 n_limit: int = 5000,
+                 device: str = 'cpu',
+                 model_path: str = ''
+                 ):
+        super().__init__(c1, c2, c3, c4, dim, single_head)
+        self.radius = radius
+        self.top_k = top_k
+        self.n_limit = n_limit
+        self.scores_th = scores_th
+        self.dkd = DKD(radius=self.radius, top_k=self.top_k,
+                       scores_th=self.scores_th, n_limit=self.n_limit)
+        self.device = device
+        if model_path != '':
+            state_dict = torch.load(model_path, self.device)
+            self.load_state_dict(state_dict)
+            self.to(self.device)
+            self.eval()
+            logging.info(f'Loaded model parameters from {model_path}')
+            logging.info(
+                f"Number of model parameters: {sum(p.numel() for p in self.parameters() if p.requires_grad) / 1e3}KB")
+    def extract_dense_map(self, image, ret_dict=False):
+        # ====================================================
+        # check image size, should be integer multiples of 2^5
+        # if it is not a integer multiples of 2^5, padding zeros
+        device = image.device
+        b, c, h, w = image.shape
+        h_ = math.ceil(h / 32) * 32 if h % 32 != 0 else h
+        w_ = math.ceil(w / 32) * 32 if w % 32 != 0 else w
+        if h_ != h:
+            h_padding = torch.zeros(b, c, h_ - h, w, device=device)
+            image = torch.cat([image, h_padding], dim=2)
+        if w_ != w:
+            w_padding = torch.zeros(b, c, h_, w_ - w, device=device)
+            image = torch.cat([image, w_padding], dim=3)
+        # ====================================================
+        scores_map, descriptor_map = super().forward(image)
+        # ====================================================
+        if h_ != h or w_ != w:
+            descriptor_map = descriptor_map[:, :, :h, :w]
+            scores_map = scores_map[:, :, :h, :w]  # Bx1xHxW
+        # ====================================================
+        # BxCxHxW
+        descriptor_map = torch.nn.functional.normalize(descriptor_map, p=2, dim=1)
+        if ret_dict:
+            return {'descriptor_map': descriptor_map, 'scores_map': scores_map, }
+        else:
+            return descriptor_map, scores_map
+    def forward(self, img, image_size_max=99999, sort=False, sub_pixel=False):
+        """
+        :param img: np.array HxWx3, RGB
+        :param image_size_max: maximum image size, otherwise, the image will be resized
+        :param sort: sort keypoints by scores
+        :param sub_pixel: whether to use sub-pixel accuracy
+        :return: a dictionary with 'keypoints', 'descriptors', 'scores', and 'time'
+        """
+        H, W, three = img.shape
+        assert three == 3, "input image shape should be [HxWx3]"
+        # ==================== image size constraint
+        image = deepcopy(img)
+        max_hw = max(H, W)
+        if max_hw > image_size_max:
+            ratio = float(image_size_max / max_hw)
+            image = cv2.resize(image, dsize=None, fx=ratio, fy=ratio)
+        # ==================== convert image to tensor
+        image = torch.from_numpy(image).to(self.device).to(torch.float32).permute(2, 0, 1)[None] / 255.0
+        # ==================== extract keypoints
+        start = time.time()
+        with torch.no_grad():
+            descriptor_map, scores_map = self.extract_dense_map(image)
+            keypoints, descriptors, scores, _ = self.dkd(scores_map, descriptor_map,
+                                                         sub_pixel=sub_pixel)
+            keypoints, descriptors, scores = keypoints[0], descriptors[0], scores[0]
+            keypoints = (keypoints + 1) / 2 * keypoints.new_tensor([[W - 1, H - 1]])
+        if sort:
+            indices = torch.argsort(scores, descending=True)
+            keypoints = keypoints[indices]
+            descriptors = descriptors[indices]
+            scores = scores[indices]
+        end = time.time()
+        return {'keypoints': keypoints.cpu().numpy(),
+                'descriptors': descriptors.cpu().numpy(),
+                'scores': scores.cpu().numpy(),
+                'scores_map': scores_map.cpu().numpy(),
+                'time': end - start, }
+if __name__ == '__main__':
+    import numpy as np
+    from thop import profile
+    net = ALike(c1=32, c2=64, c3=128, c4=128, dim=128, single_head=False)
+    image = np.random.random((640, 480, 3)).astype(np.float32)
+    flops, params = profile(net, inputs=(image, 9999, False), verbose=False)
+    print('{:<30}  {:<8} GFLops'.format('Computational complexity: ', flops / 1e9))
+    print('{:<30}  {:<8} KB'.format('Number of parameters: ', params / 1e3))

alnet.py ADDED Viewed

	@@ -0,0 +1,164 @@

+import torch
+from torch import nn
+from torchvision.models import resnet
+from typing import Optional, Callable
+class ConvBlock(nn.Module):
+    def __init__(self, in_channels, out_channels,
+                 gate: Optional[Callable[..., nn.Module]] = None,
+                 norm_layer: Optional[Callable[..., nn.Module]] = None):
+        super().__init__()
+        if gate is None:
+            self.gate = nn.ReLU(inplace=True)
+        else:
+            self.gate = gate
+        if norm_layer is None:
+            norm_layer = nn.BatchNorm2d
+        self.conv1 = resnet.conv3x3(in_channels, out_channels)
+        self.bn1 = norm_layer(out_channels)
+        self.conv2 = resnet.conv3x3(out_channels, out_channels)
+        self.bn2 = norm_layer(out_channels)
+    def forward(self, x):
+        x = self.gate(self.bn1(self.conv1(x)))  # B x in_channels x H x W
+        x = self.gate(self.bn2(self.conv2(x)))  # B x out_channels x H x W
+        return x
+# copied from torchvision\models\resnet.py#27->BasicBlock
+class ResBlock(nn.Module):
+    expansion: int = 1
+    def __init__(
+            self,
+            inplanes: int,
+            planes: int,
+            stride: int = 1,
+            downsample: Optional[nn.Module] = None,
+            groups: int = 1,
+            base_width: int = 64,
+            dilation: int = 1,
+            gate: Optional[Callable[..., nn.Module]] = None,
+            norm_layer: Optional[Callable[..., nn.Module]] = None
+    ) -> None:
+        super(ResBlock, self).__init__()
+        if gate is None:
+            self.gate = nn.ReLU(inplace=True)
+        else:
+            self.gate = gate
+        if norm_layer is None:
+            norm_layer = nn.BatchNorm2d
+        if groups != 1 or base_width != 64:
+            raise ValueError('ResBlock only supports groups=1 and base_width=64')
+        if dilation > 1:
+            raise NotImplementedError("Dilation > 1 not supported in ResBlock")
+        # Both self.conv1 and self.downsample layers downsample the input when stride != 1
+        self.conv1 = resnet.conv3x3(inplanes, planes, stride)
+        self.bn1 = norm_layer(planes)
+        self.conv2 = resnet.conv3x3(planes, planes)
+        self.bn2 = norm_layer(planes)
+        self.downsample = downsample
+        self.stride = stride
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        identity = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.gate(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        if self.downsample is not None:
+            identity = self.downsample(x)
+        out += identity
+        out = self.gate(out)
+        return out
+class ALNet(nn.Module):
+    def __init__(self, c1: int = 32, c2: int = 64, c3: int = 128, c4: int = 128, dim: int = 128,
+                 single_head: bool = True,
+                 ):
+        super().__init__()
+        self.gate = nn.ReLU(inplace=True)
+        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
+        self.pool4 = nn.MaxPool2d(kernel_size=4, stride=4)
+        self.block1 = ConvBlock(3, c1, self.gate, nn.BatchNorm2d)
+        self.block2 = ResBlock(inplanes=c1, planes=c2, stride=1,
+                               downsample=nn.Conv2d(c1, c2, 1),
+                               gate=self.gate,
+                               norm_layer=nn.BatchNorm2d)
+        self.block3 = ResBlock(inplanes=c2, planes=c3, stride=1,
+                               downsample=nn.Conv2d(c2, c3, 1),
+                               gate=self.gate,
+                               norm_layer=nn.BatchNorm2d)
+        self.block4 = ResBlock(inplanes=c3, planes=c4, stride=1,
+                               downsample=nn.Conv2d(c3, c4, 1),
+                               gate=self.gate,
+                               norm_layer=nn.BatchNorm2d)
+        # ================================== feature aggregation
+        self.conv1 = resnet.conv1x1(c1, dim // 4)
+        self.conv2 = resnet.conv1x1(c2, dim // 4)
+        self.conv3 = resnet.conv1x1(c3, dim // 4)
+        self.conv4 = resnet.conv1x1(dim, dim // 4)
+        self.upsample2 = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
+        self.upsample4 = nn.Upsample(scale_factor=4, mode='bilinear', align_corners=True)
+        self.upsample8 = nn.Upsample(scale_factor=8, mode='bilinear', align_corners=True)
+        self.upsample32 = nn.Upsample(scale_factor=32, mode='bilinear', align_corners=True)
+        # ================================== detector and descriptor head
+        self.single_head = single_head
+        if not self.single_head:
+            self.convhead1 = resnet.conv1x1(dim, dim)
+        self.convhead2 = resnet.conv1x1(dim, dim + 1)
+    def forward(self, image):
+        # ================================== feature encoder
+        x1 = self.block1(image)  # B x c1 x H x W
+        x2 = self.pool2(x1)
+        x2 = self.block2(x2)  # B x c2 x H/2 x W/2
+        x3 = self.pool4(x2)
+        x3 = self.block3(x3)  # B x c3 x H/8 x W/8
+        x4 = self.pool4(x3)
+        x4 = self.block4(x4)  # B x dim x H/32 x W/32
+        # ================================== feature aggregation
+        x1 = self.gate(self.conv1(x1))  # B x dim//4 x H x W
+        x2 = self.gate(self.conv2(x2))  # B x dim//4 x H//2 x W//2
+        x3 = self.gate(self.conv3(x3))  # B x dim//4 x H//8 x W//8
+        x4 = self.gate(self.conv4(x4))  # B x dim//4 x H//32 x W//32
+        x2_up = self.upsample2(x2)  # B x dim//4 x H x W
+        x3_up = self.upsample8(x3)  # B x dim//4 x H x W
+        x4_up = self.upsample32(x4)  # B x dim//4 x H x W
+        x1234 = torch.cat([x1, x2_up, x3_up, x4_up], dim=1)
+        # ================================== detector and descriptor head
+        if not self.single_head:
+            x1234 = self.gate(self.convhead1(x1234))
+        x = self.convhead2(x1234)  # B x dim+1 x H x W
+        descriptor_map = x[:, :-1, :, :]
+        scores_map = torch.sigmoid(x[:, -1, :, :]).unsqueeze(1)
+        return scores_map, descriptor_map
+if __name__ == '__main__':
+    from thop import profile
+    net = ALNet(c1=16, c2=32, c3=64, c4=128, dim=128, single_head=True)
+    image = torch.randn(1, 3, 640, 480)
+    flops, params = profile(net, inputs=(image,), verbose=False)
+    print('{:<30}  {:<8} GFLops'.format('Computational complexity: ', flops / 1e9))
+    print('{:<30}  {:<8} KB'.format('Number of parameters: ', params / 1e3))

assets/alike.png ADDED Viewed

assets/kitti/000100.png ADDED Viewed

assets/kitti/000101.png ADDED Viewed

assets/kitti/000102.png ADDED Viewed

assets/kitti/000103.png ADDED Viewed

assets/kitti/000104.png ADDED Viewed

assets/kitti/000105.png ADDED Viewed

assets/kitti/000106.png ADDED Viewed

assets/kitti/000107.png ADDED Viewed

assets/kitti/000108.png ADDED Viewed

assets/kitti/000109.png ADDED Viewed

assets/kitti/000110.png ADDED Viewed

assets/kitti/000111.png ADDED Viewed

assets/kitti/000112.png ADDED Viewed

assets/kitti/000113.png ADDED Viewed

assets/kitti/000114.png ADDED Viewed

assets/kitti/000115.png ADDED Viewed

assets/kitti/000116.png ADDED Viewed

assets/kitti/000117.png ADDED Viewed

assets/kitti/000118.png ADDED Viewed

assets/kitti/000119.png ADDED Viewed

assets/tum/1311868169.163498.png ADDED Viewed

assets/tum/1311868169.263274.png ADDED Viewed

assets/tum/1311868169.363470.png ADDED Viewed

assets/tum/1311868169.463229.png ADDED Viewed

assets/tum/1311868169.563501.png ADDED Viewed

assets/tum/1311868169.663240.png ADDED Viewed

assets/tum/1311868169.763417.png ADDED Viewed

assets/tum/1311868169.863396.png ADDED Viewed

assets/tum/1311868169.963415.png ADDED Viewed

assets/tum/1311868170.063469.png ADDED Viewed

assets/tum/1311868170.163416.png ADDED Viewed

assets/tum/1311868170.263521.png ADDED Viewed

assets/tum/1311868170.363400.png ADDED Viewed

assets/tum/1311868170.463383.png ADDED Viewed

assets/tum/1311868170.563345.png ADDED Viewed

assets/tum/1311868170.663430.png ADDED Viewed

assets/tum/1311868170.763453.png ADDED Viewed

assets/tum/1311868170.863446.png ADDED Viewed

assets/tum/1311868170.963440.png ADDED Viewed

assets/tum/1311868171.063438.png ADDED Viewed

demo.py ADDED Viewed

	@@ -0,0 +1,167 @@

+import copy
+import os
+import cv2
+import glob
+import logging
+import argparse
+import numpy as np
+from tqdm import tqdm
+from alike import ALike, configs
+class ImageLoader(object):
+    def __init__(self, filepath: str):
+        self.N = 3000
+        if filepath.startswith('camera'):
+            camera = int(filepath[6:])
+            self.cap = cv2.VideoCapture(camera)
+            if not self.cap.isOpened():
+                raise IOError(f"Can't open camera {camera}!")
+            logging.info(f'Opened camera {camera}')
+            self.mode = 'camera'
+        elif os.path.exists(filepath):
+            if os.path.isfile(filepath):
+                self.cap = cv2.VideoCapture(filepath)
+                if not self.cap.isOpened():
+                    raise IOError(f"Can't open video {filepath}!")
+                rate = self.cap.get(cv2.CAP_PROP_FPS)
+                self.N = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT)) - 1
+                duration = self.N / rate
+                logging.info(f'Opened video {filepath}')
+                logging.info(f'Frames: {self.N}, FPS: {rate}, Duration: {duration}s')
+                self.mode = 'video'
+            else:
+                self.images = glob.glob(os.path.join(filepath, '*.png')) + \
+                              glob.glob(os.path.join(filepath, '*.jpg')) + \
+                              glob.glob(os.path.join(filepath, '*.ppm'))
+                self.images.sort()
+                self.N = len(self.images)
+                logging.info(f'Loading {self.N} images')
+                self.mode = 'images'
+        else:
+            raise IOError('Error filepath (camerax/path of images/path of videos): ', filepath)
+    def __getitem__(self, item):
+        if self.mode == 'camera' or self.mode == 'video':
+            if item > self.N:
+                return None
+            ret, img = self.cap.read()
+            if not ret:
+                raise "Can't read image from camera"
+            if self.mode == 'video':
+                self.cap.set(cv2.CAP_PROP_POS_FRAMES, item)
+        elif self.mode == 'images':
+            filename = self.images[item]
+            img = cv2.imread(filename)
+            if img is None:
+                raise Exception('Error reading image %s' % filename)
+        return img
+    def __len__(self):
+        return self.N
+class SimpleTracker(object):
+    def __init__(self):
+        self.pts_prev = None
+        self.desc_prev = None
+    def update(self, img, pts, desc):
+        N_matches = 0
+        if self.pts_prev is None:
+            self.pts_prev = pts
+            self.desc_prev = desc
+            out = copy.deepcopy(img)
+            for pt1 in pts:
+                p1 = (int(round(pt1[0])), int(round(pt1[1])))
+                cv2.circle(out, p1, 1, (0, 0, 255), -1, lineType=16)
+        else:
+            matches = self.mnn_mather(self.desc_prev, desc)
+            mpts1, mpts2 = self.pts_prev[matches[:, 0]], pts[matches[:, 1]]
+            N_matches = len(matches)
+            out = copy.deepcopy(img)
+            for pt1, pt2 in zip(mpts1, mpts2):
+                p1 = (int(round(pt1[0])), int(round(pt1[1])))
+                p2 = (int(round(pt2[0])), int(round(pt2[1])))
+                cv2.line(out, p1, p2, (0, 255, 0), lineType=16)
+                cv2.circle(out, p2, 1, (0, 0, 255), -1, lineType=16)
+            self.pts_prev = pts
+            self.desc_prev = desc
+        return out, N_matches
+    def mnn_mather(self, desc1, desc2):
+        sim = desc1 @ desc2.transpose()
+        sim[sim < 0.9] = 0
+        nn12 = np.argmax(sim, axis=1)
+        nn21 = np.argmax(sim, axis=0)
+        ids1 = np.arange(0, sim.shape[0])
+        mask = (ids1 == nn21[nn12])
+        matches = np.stack([ids1[mask], nn12[mask]])
+        return matches.transpose()
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='ALike Demo.')
+    parser.add_argument('input', type=str, default='',
+                        help='Image directory or movie file or "camera0" (for webcam0).')
+    parser.add_argument('--model', choices=['alike-t', 'alike-s', 'alike-n', 'alike-l'], default="alike-t",
+                        help="The model configuration")
+    parser.add_argument('--device', type=str, default='cuda', help="Running device (default: cuda).")
+    parser.add_argument('--top_k', type=int, default=-1,
+                        help='Detect top K keypoints. -1 for threshold based mode, >0 for top K mode. (default: -1)')
+    parser.add_argument('--scores_th', type=float, default=0.2,
+                        help='Detector score threshold (default: 0.2).')
+    parser.add_argument('--n_limit', type=int, default=5000,
+                        help='Maximum number of keypoints to be detected (default: 5000).')
+    parser.add_argument('--no_display', action='store_true',
+                        help='Do not display images to screen. Useful if running remotely (default: False).')
+    parser.add_argument('--no_sub_pixel', action='store_true',
+                        help='Do not detect sub-pixel keypoints (default: False).')
+    args = parser.parse_args()
+    logging.basicConfig(level=logging.INFO)
+    image_loader = ImageLoader(args.input)
+    model = ALike(**configs[args.model],
+                  device=args.device,
+                  top_k=args.top_k,
+                  scores_th=args.scores_th,
+                  n_limit=args.n_limit)
+    tracker = SimpleTracker()
+    if not args.no_display:
+        logging.info("Press 'q' to stop!")
+        cv2.namedWindow(args.model)
+    runtime = []
+    progress_bar = tqdm(image_loader)
+    for img in progress_bar:
+        if img is None:
+            break
+        pred = model(img, sub_pixel=not args.no_sub_pixel)
+        kpts = pred['keypoints']
+        desc = pred['descriptors']
+        runtime.append(pred['time'])
+        out, N_matches = tracker.update(img, kpts, desc)
+        ave_fps = (1. / np.stack(runtime)).mean()
+        status = f"Fps:{ave_fps:.1f}, Keypoints/Matches: {len(kpts)}/{N_matches}"
+        progress_bar.set_description(status)
+        if not args.no_display:
+            cv2.setWindowTitle(args.model, args.model + ': ' + status)
+            cv2.imshow(args.model, out)
+            if cv2.waitKey(1) == ord('q'):
+                break
+    logging.info('Finished!')
+    if not args.no_display:
+        logging.info('Press any key to exit!')
+        cv2.waitKey()

hseq/cache/alike-l-ms.npy ADDED Viewed

Binary file (13.1 kB). View file

hseq/cache/alike-l.npy ADDED Viewed

Binary file (13.1 kB). View file

hseq/cache/alike-n-ms.npy ADDED Viewed

Binary file (13.1 kB). View file

hseq/cache/alike-n.npy ADDED Viewed

Binary file (13.1 kB). View file

hseq/cache/aslfeat.npy ADDED Viewed

Binary file (15.4 kB). View file