Upload 10 files

Browse files

Files changed (10) hide show

README.md +61 -0
configs/beit-base-p16_224px.py +19 -0
data/fc.pkl +3 -0
data/imagenet_test.pkl +3 -0
data/imagenet_train.pkl +3 -0
imgs/DTD_cracked_0004.jpg +0 -0
imgs/framework.png +0 -0
imgs/moodv2_table.png +0 -0
pretrain/beitv2-base.pth +3 -0
src/demo.py +305 -0

README.md CHANGED Viewed

@@ -1,3 +1,64 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language:
+- en
+pipeline_tag: zero-shot-image-classification
+tags:
+- ood-detection
+- outlier-detection
 ---
+<p style="font-size:28px;" align="center">
+🏠 MOODv2
+</p>
+<p align="center">
+• 🤗 <a href="https://huggingface.co/datasets/JingyaoLi/MOODv2" target="_blank">Model </a>
+• 🐱 <a href="https://github.com/dvlab-research/MOOD" target="_blank">Code</a>
+• 📃 <a href="https://arxiv.org/abs/2302.02615" target="_blank">Paper</a> <br>
+</p>
+## Abstract
+The crux of effective out-of-distribution (OOD) detection lies in acquiring a robust in-distribution (ID) representation, distinct from OOD samples. While previous methods predominantly leaned on recognition-based techniques for this purpose, they often resulted in shortcut learning, lacking comprehensive representations. In our study, we conducted a comprehensive analysis, exploring distinct pretraining tasks and employing various OOD score functions. The results highlight that the feature representations pre-trained through reconstruction yield a notable enhancement and narrow the performance gap among various score functions. This suggests that even simple score functions can rival complex ones when leveraging reconstruction-based pretext tasks. Reconstruction-based pretext tasks adapt well to various score functions. As such, it holds promising potential for further expansion. Our OOD detection framework, MOODv2, employs the masked image modeling pretext task. Without bells and whistles, MOODv2 impressively enhances 14.30% AUROC to 95.68% on ImageNet and achieves 99.98% on CIFAR-10.
+![framework](imgs/framework.png)
+## Performance
+![table](imgs/moodv2_table.png)
+## Usage
+To predict an input image is in-distribution or out-of-distribution, we support the following OOD detection methods:
+- `MSP`
+- `MaxLogit`
+- `Energy`
+- `Energy+React`
+- `ViM`
+- `Residual`
+- `GradNorm`
+- `Mahalanobis`
+- `KL-Matching`
+```bash
+python src/demo.py \
+   --img_path imgs/DTD_cracked_0004.jpg \ # change to your image path if needed
+   --cfg configs/beit-base-p16_224px.py \
+   --checkpoint pretrain/beitv2-base_3rdparty_in1k_20221114-73e11905.pth \
+   --fc_save_path data/fc.pkl \
+   --id_train_feature data/imagenet_train.pkl \
+   --id_val_feature data/imagenet_test.pkl \
+   --methods MSP MaxLogit Energy Energy+React ViM Residual GradNorm Mahalanobis
+```
+For the example OOD image `imgs/DTD_cracked_0004.jpg`, you are supposed to get:
+```
+MSP  evaluation:   out-of-distribution
+MaxLogit  evaluation:   out-of-distribution
+Energy  evaluation:   out-of-distribution
+Energy+React  evaluation:   out-of-distribution
+ViM  evaluation:   out-of-distribution
+Residual  evaluation:   out-of-distribution
+GradNorm  evaluation:   out-of-distribution
+Mahalanobis  evaluation:   out-of-distribution
+```
+## Benchmark
+For reproduce the results in our paper, please refer to our [repository](https://github.com/dvlab-research/MOOD) for details.

configs/beit-base-p16_224px.py ADDED Viewed

	@@ -0,0 +1,19 @@

+model = dict(
+    type='ImageClassifier',
+    backbone=dict(
+        type='BEiTViT',
+        arch='base',
+        img_size=224,
+        patch_size=16,
+        out_type='avg_featmap',
+        use_abs_pos_emb=False,
+        use_rel_pos_bias=True,
+        use_shared_rel_pos_bias=False,
+        ),
+    neck=None,
+    head=dict(
+        type='LinearClsHead',
+        num_classes=1000,
+        in_channels=768,
+        ),
+    )

data/fc.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b1d44fc481b6c5704e55a515594038992c19d64d3040b7338f29653447baa73e
+size 3076201

data/imagenet_test.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:36ef3379d4b85893d00baffa94e988a7988d2b448a6b5f37b7b98291e5a7ae88
+size 153600163

data/imagenet_train.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ed78756744b40626a9d57e5a327dd1397633c66031e9f90b2874551356324e00
+size 614400165

imgs/DTD_cracked_0004.jpg ADDED Viewed

imgs/framework.png ADDED Viewed

imgs/moodv2_table.png ADDED Viewed

pretrain/beitv2-base.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:73e11905570316ca4361bd0766f166ae4d36f568067775d265f6cc2fe83a2b31
+size 176847080

src/demo.py ADDED Viewed

	@@ -0,0 +1,305 @@

+#!/usr/bin/env python
+import argparse
+import json
+from os.path import basename, splitext
+import os
+import mmengine
+import numpy as np
+import pandas as pd
+import torch
+from numpy.linalg import norm, pinv
+from scipy.special import logsumexp, softmax
+from sklearn import metrics
+from sklearn.covariance import EmpiricalCovariance
+from sklearn.metrics import pairwise_distances_argmin_min
+from tqdm import tqdm
+import pickle
+from os.path import dirname
+import torchvision as tv
+from PIL import Image
+from mmpretrain.apis import init_model
+def parse_args():
+    parser = argparse.ArgumentParser(description='Detect an image')
+    parser.add_argument(
+        '--cfg', help='Path to config',
+        default='/dataset/jingyaoli/AD/MOOD_/MOODv2/configs/beit-base-p16_224px.py')
+    parser.add_argument('--ood_feature',
+        default=None, help='Path to ood feature file')
+    parser.add_argument(
+        '--checkpoint', help='Path to checkpoint',
+        default='/dataset/jingyaoli/AD/MOODv2/pretrain/beit-base_3rdparty_in1k_20221114-c0a4df23.pth',)
+    parser.add_argument('--img_path', help='Path to image',
+        default='/dataset/jingyaoli/AD/MOOD_/MOODv2/imgs/DTD_cracked_0004.jpg')
+    parser.add_argument('--fc',
+        default='/dataset/jingyaoli/AD/MOODv2/outputs/beit-224px/fc.pkl', help='Path to fc path')
+    parser.add_argument('--id_data', default='imagenet', help='id data name')
+    parser.add_argument('--id_train_feature',
+        default='/dataset/jingyaoli/AD/MOODv2/outputs/beit-224px/imagenet_train.pkl', help='Path to data')
+    parser.add_argument('--id_val_feature',
+        default='/dataset/jingyaoli/AD/MOODv2/outputs/beit-224px/imagenet_test.pkl', help='Path to output file')
+    parser.add_argument('--ood_features',
+        default=None, nargs='+', help='Path to ood features')
+    parser.add_argument(
+        '--methods', nargs='+',
+        default=['MSP', 'MaxLogit', 'Energy', 'Energy+React', 'ViM', 'Residual', 'GradNorm', 'Mahalanobis', ],  # 'KL-Matching'
+        help='methods')
+    parser.add_argument(
+        '--train_label',
+        default='datalists/imagenet2012_train_random_200k.txt',
+        help='Path to train labels')
+    parser.add_argument(
+        '--clip_quantile', default=0.99, help='Clip quantile to react')
+    parser.add_argument(
+        '--fpr', default=95, help='False Positive Rate')
+    return parser.parse_args()
+def evaluate(method, score_id, score_ood, target_fpr):
+    threhold = np.percentile(score_id, 100 - target_fpr)
+    if score_ood >= threhold:
+        print('\033[94m', method, '\033[0m', 'evaluation:', '\033[92m', 'in-distribution', '\033[0m')
+    else:
+        print('\033[94m', method, '\033[0m', 'evaluation:', '\033[91m', 'out-of-distribution', '\033[0m')
+def kl(p, q):
+    return np.sum(np.where(p != 0, p * np.log(p / q), 0))
+def gradnorm(x, w, b, num_cls):
+    fc = torch.nn.Linear(*w.shape[::-1])
+    fc.weight.data[...] = torch.from_numpy(w)
+    fc.bias.data[...] = torch.from_numpy(b)
+    fc.cuda()
+    x = torch.from_numpy(x).float().cuda()
+    logsoftmax = torch.nn.LogSoftmax(dim=-1).cuda()
+    confs = []
+    for i in tqdm(x, desc='Computing Gradnorm ID/OOD score'):
+        targets = torch.ones((1, num_cls)).cuda()
+        fc.zero_grad()
+        loss = torch.mean(
+            torch.sum(-targets * logsoftmax(fc(i[None])), dim=-1))
+        loss.backward()
+        layer_grad_norm = torch.sum(torch.abs(
+            fc.weight.grad.data)).cpu().numpy()
+        confs.append(layer_grad_norm)
+    return np.array(confs)
+def extract_image_feature(args):
+    torch.backends.cudnn.benchmark = True
+    print('=> Loading model')
+    cfg = mmengine.Config.fromfile(args.cfg)
+    model = init_model(cfg, args.checkpoint, 0).cuda().eval()
+    print('=> Loading image')
+    if hasattr(cfg.model.backbone, 'img_size'):
+        img_size = cfg.model.backbone.img_size
+    else:
+        img_size = 224
+    transform = tv.transforms.Compose([
+        tv.transforms.Resize((img_size, img_size)),
+        tv.transforms.ToTensor(),
+        tv.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
+    ])
+    x = transform(Image.open(args.img_path).convert('RGB')).unsqueeze(0)
+    print('=> Extracting feature')
+    with torch.no_grad():
+        x = x.cuda()
+        if cfg.model.backbone.type == 'BEiTPretrainViT':
+            # (B, L, C) -> (B, C)
+            feat_batch = model.backbone(
+                x, mask=None)[0].mean(1)
+        elif cfg.model.backbone.type == 'SwinTransformer':
+            # (B, C, H, W) -> (B, C)
+            feat_batch = model.backbone(x)[0]
+            B, C, H, W = feat_batch.shape
+            feat_batch = feat_batch.reshape(B, C, -1).mean(-1)
+        else:
+            # (B, C)
+            feat_batch = model.backbone(x)[0]
+            assert len(feat_batch.shape) == 2
+        feature = feat_batch.cpu().numpy()
+    print(f'Extracted Feature: {feature.shape}')
+    return feature
+def main():
+    args = parse_args()
+    if args.ood_feature and os.path.exists(args.ood_feature):
+        feature_ood = mmengine.load(args.ood_feature)
+    else:
+        feature_ood = extract_image_feature(args)
+    if os.path.exists(args.fc):
+        w, b = mmengine.load(args.fc)
+        print(f'{w.shape=}, {b.shape=}')
+    num_cls = len(b)
+    train_labels = np.array([
+        int(line.rsplit(' ', 1)[-1])
+        for line in mmengine.list_from_file(args.train_label)
+    ], dtype=int)
+    print(f'image path: {args.img_path}')
+    print('=> Loading features')
+    feature_id_train = mmengine.load(args.id_train_feature).squeeze()
+    feature_id_val = mmengine.load(args.id_val_feature).squeeze()
+    print(f'{feature_id_train.shape=}, {feature_id_val.shape=}')
+    if os.path.exists(args.fc):
+        print('=> Computing logits...')
+        logit_id_train = feature_id_train @ w.T + b
+        logit_id_val = feature_id_val @ w.T + b
+        logit_ood = feature_ood @ w.T + b
+        print('=> Computing softmax...')
+        softmax_id_train = softmax(logit_id_train, axis=-1)
+        softmax_id_val = softmax(logit_id_val, axis=-1)
+        softmax_ood = softmax(logit_ood, axis=-1)
+        u = -np.matmul(pinv(w), b)
+    # ---------------------------------------
+    method = 'MSP'
+    if method in args.methods:
+        score_id = softmax_id_val.max(axis=-1)
+        score_ood = softmax_ood.max(axis=-1)
+        result = evaluate(method, score_id, score_ood, args.fpr)
+    # ---------------------------------------
+    method = 'MaxLogit'
+    if method in args.methods:
+        score_id = logit_id_val.max(axis=-1)
+        score_ood = logit_ood.max(axis=-1)
+        result = evaluate(method, score_id, score_ood, args.fpr)
+    # ---------------------------------------
+    method = 'Energy'
+    if method in args.methods:
+        score_id = logsumexp(logit_id_val, axis=-1)
+        score_ood = logsumexp(logit_ood, axis=-1)
+        result = evaluate(method, score_id, score_ood, args.fpr)
+    # ---------------------------------------
+    method = 'Energy+React'
+    if method in args.methods:
+        clip = np.quantile(feature_id_train, args.clip_quantile)
+        logit_id_val_clip = np.clip(
+            feature_id_val, a_min=None, a_max=clip) @ w.T + b
+        score_id = logsumexp(logit_id_val_clip, axis=-1)
+        logit_ood_clip = np.clip(feature_ood, a_min=None, a_max=clip) @ w.T + b
+        score_ood = logsumexp(logit_ood_clip, axis=-1)
+        result = evaluate(method, score_id, score_ood, args.fpr)
+    # ---------------------------------------
+    method = 'ViM'
+    if method in args.methods:
+        if feature_id_val.shape[-1] >= 2048:
+            DIM = num_cls
+        elif feature_id_val.shape[-1] >= 768:
+            DIM = 512
+        else:
+            DIM = feature_id_val.shape[-1] // 2
+        ec = EmpiricalCovariance(assume_centered=True)
+        ec.fit(feature_id_train - u)
+        eig_vals, eigen_vectors = np.linalg.eig(ec.covariance_)
+        NS = np.ascontiguousarray(
+            (eigen_vectors.T[np.argsort(eig_vals * -1)[DIM:]]).T)
+        vlogit_id_train = norm(np.matmul(feature_id_train - u, NS), axis=-1)
+        alpha = logit_id_train.max(axis=-1).mean() / vlogit_id_train.mean()
+        vlogit_id_val = norm(np.matmul(feature_id_val - u, NS), axis=-1) * alpha
+        energy_id_val = logsumexp(logit_id_val, axis=-1)
+        score_id = -vlogit_id_val + energy_id_val
+        energy_ood = logsumexp(logit_ood, axis=-1)
+        vlogit_ood = norm(np.matmul(feature_ood - u, NS), axis=-1) * alpha
+        score_ood = -vlogit_ood + energy_ood
+        result = evaluate(method, score_id, score_ood, args.fpr)
+    # ---------------------------------------
+    method = 'Residual'
+    if method in args.methods:
+        if feature_id_val.shape[-1] >= 2048:
+            DIM = 1000
+        elif feature_id_val.shape[-1] >= 768:
+            DIM = 512
+        else:
+            DIM = feature_id_val.shape[-1] // 2
+        ec = EmpiricalCovariance(assume_centered=True)
+        ec.fit(feature_id_train - u)
+        eig_vals, eigen_vectors = np.linalg.eig(ec.covariance_)
+        NS = np.ascontiguousarray(
+            (eigen_vectors.T[np.argsort(eig_vals * -1)[DIM:]]).T)
+        score_id = -norm(np.matmul(feature_id_val - u, NS), axis=-1)
+        score_ood = -norm(np.matmul(feature_ood - u, NS), axis=-1)
+        result = evaluate(method, score_id, score_ood, args.fpr)
+    # ---------------------------------------
+    method = 'GradNorm'
+    if method in args.methods:
+        score_ood = gradnorm(feature_ood, w, b, num_cls)
+        score_id = gradnorm(feature_id_val, w, b, num_cls)
+        result = evaluate(method, score_id, score_ood, args.fpr)
+    # ---------------------------------------
+    method = 'Mahalanobis'
+    if method in args.methods:
+        train_means = []
+        train_feat_centered = []
+        for i in tqdm(range(train_labels.max() + 1), desc='Computing classwise mean feature'):
+            fs = feature_id_train[train_labels == i]
+            _m = fs.mean(axis=0)
+            train_means.append(_m)
+            train_feat_centered.extend(fs - _m)
+        ec = EmpiricalCovariance(assume_centered=True)
+        ec.fit(np.array(train_feat_centered).astype(np.float64))
+        mean = torch.from_numpy(np.array(train_means)).cuda().float()
+        prec = torch.from_numpy(ec.precision_).cuda().float()
+        score_id = -np.array(
+            [(((f - mean) @ prec) * (f - mean)).sum(axis=-1).min().cpu().item()
+            for f in tqdm(torch.from_numpy(feature_id_val).cuda().float(),  desc='Computing Mahalanobis ID score')])
+        score_ood = -np.array([
+            (((f - mean) @ prec) * (f - mean)).sum(axis=-1).min().cpu().item()
+            for f in tqdm(torch.from_numpy(feature_ood).cuda().float(), desc='Computing Mahalanobis OOD score')
+        ])
+        result = evaluate(method, score_id, score_ood, args.fpr)
+    # ---------------------------------------
+    method = 'KL-Matching'
+    if method in args.methods:
+        pred_labels_train = np.argmax(softmax_id_train, axis=-1)
+        mean_softmax_train = []
+        for i in tqdm(range(num_cls), desc='Computing classwise mean softmax'):
+            mean_softmax = softmax_id_train[pred_labels_train == i]
+            if mean_softmax.shape[0] == 0:
+                mean_softmax_train.append(np.zeros((num_cls)))
+            else:
+                mean_softmax_train.append(np.mean(mean_softmax, axis=0))
+        score_id = -pairwise_distances_argmin_min(
+            softmax_id_val, np.array(mean_softmax_train), metric=kl)[1]
+        score_ood = -pairwise_distances_argmin_min(
+            softmax_ood, np.array(mean_softmax_train), metric=kl)[1]
+        result = evaluate(method, score_id, score_ood, args.fpr)
+if __name__ == '__main__':
+    main()