MackinationsAi commited on
Commit
3321782
1 Parent(s): 94c5c91

Upload 4 files

Browse files
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: depth-estimation
6
+ tags:
7
+ - depth
8
+ - relative depth
9
+ ---
10
+
11
+ # Depth-Anything-V2-Small
12
+
13
+ ## Introduction
14
+ Depth Anything V2 is trained from 595K synthetic labeled images & 62M+ real unlabeled images, providing the most capable monocular depth estimation (MDE) model with the following features:
15
+ - more fine-grained details than Depth Anything V1
16
+ - more robust than Depth Anything V1 & SD-based models (e.g., Marigold, Geowizard)
17
+ - more efficient (10x faster) & more lightweight than SD-based models
18
+ - impressive fine-tuned performance with our pre-trained models
19
+
20
+ ## Installation
21
+
22
+ ```bash
23
+ git clone https://github.com/MackinationsAi/Upgraded-Depth-Anything-V2.git
24
+ cd Upgraded-Depth-Anything-V2
25
+ one_click_install.bat
26
+ ```
27
+
28
+ ## Usage
29
+
30
+ Please refer to the [README.md](https://github.com/MackinationsAi/Upgraded-Depth-Anything-V2/blob/main/README.md) for actual usage.
31
+
32
+ ## Test Code
33
+
34
+ ```bash
35
+ cd Upgraded-Depth-Anything-V2
36
+ venv\scripts\activate
37
+ python test.py /path/to/your/image.jpg (or .png)
38
+ ```
39
+ Create a test.py script using the code below:
40
+
41
+ ```python
42
+ import cv2
43
+ import torch
44
+ import numpy as np
45
+ import os
46
+ import argparse
47
+
48
+ from safetensors.torch import load_file
49
+ from depth_anything_v2.dpt import DepthAnythingV2
50
+
51
+ # Argument parser for input image path
52
+ parser = argparse.ArgumentParser(description="Depth map inference using DepthAnythingV2 model.")
53
+ parser.add_argument("input_image_path", type=str, help="Path to the input image")
54
+ args = parser.parse_args()
55
+
56
+ # Determine the directory of this script
57
+ script_dir = os.path.dirname(os.path.abspath(__file__))
58
+
59
+ # Set output path relative to the script directory
60
+ output_image_path = os.path.join(script_dir, "base_udav2_hf-code-test.png")
61
+ checkpoint_path = os.path.join(script_dir, "checkpoints", "depth_anything_v2_vits.safetensors")
62
+
63
+ # Device selection: CUDA, MPS, or CPU
64
+ if torch.cuda.is_available():
65
+ device = torch.device('cuda')
66
+ elif torch.backends.mps.is_available():
67
+ device = torch.device('mps')
68
+ else:
69
+ device = torch.device('cpu')
70
+
71
+ model = DepthAnythingV2(encoder='vitb', features=64, out_channels=[48, 96, 192, 384])
72
+
73
+ state_dict = load_file(checkpoint_path, device='cpu')
74
+
75
+ model.load_state_dict(state_dict)
76
+ model.to(device)
77
+ model.eval()
78
+
79
+ # Load the input image
80
+ raw_img = cv2.imread(args.input_image_path)
81
+
82
+ # Infer the depth map
83
+ depth = model.infer_image(raw_img) # HxW raw depth map
84
+
85
+ # Normalize the depth map to 0-255 for saving as an image
86
+ depth_normalized = cv2.normalize(depth, None, 0, 255, cv2.NORM_MINMAX)
87
+ depth_normalized = depth_normalized.astype(np.uint8)
88
+
89
+ cv2.imwrite(output_image_path, depth_normalized)
90
+ print(f"Depth map saved at {output_image_path}")
91
+ ```
92
+
93
+ ## Citation
94
+
95
+ If you find this project useful, please consider citing [MackinationsAi](https://github.com/MackinationsAi/) & the following:
96
+
97
+ ```bibtex
98
+ @article{depth_anything_v2,
99
+ title={Depth Anything V2},
100
+ author={Yang, Lihe & Kang, Bingyi & Huang, Zilong & Zhao, Zhen & Xu, Xiaogang & Feng, Jiashi & Zhao, Hengshuang},
101
+ journal={arXiv:2406.09414},
102
+ year={2024}
103
+ }
104
+
105
+ @inproceedings{depth_anything_v1,
106
+ title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
107
+ author={Yang, Lihe & Kang, Bingyi & Huang, Zilong & Xu, Xiaogang & Feng, Jiashi & Zhao, Hengshuang},
108
+ booktitle={CVPR},
109
+ year={2024}
110
+ }
config.json ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_commit_hash": null,
3
+ "architectures": [
4
+ "DepthAnythingV2ForDepthEstimation"
5
+ ],
6
+ "backbone": null,
7
+ "backbone_config": {
8
+ "architectures": [
9
+ "Dinov2Model"
10
+ ],
11
+ "hidden_size": 384,
12
+ "image_size": 518,
13
+ "model_type": "dinov2",
14
+ "num_attention_heads": 6,
15
+ "out_features": [
16
+ "stage9",
17
+ "stage10",
18
+ "stage11",
19
+ "stage12"
20
+ ],
21
+ "out_indices": [
22
+ 9,
23
+ 10,
24
+ 11,
25
+ 12
26
+ ],
27
+ "patch_size": 14,
28
+ "reshape_hidden_states": false,
29
+ "torch_dtype": "float32"
30
+ },
31
+ "fusion_hidden_size": 64,
32
+ "head_hidden_size": 32,
33
+ "head_in_index": -1,
34
+ "initializer_range": 0.02,
35
+ "model_type": "depth_anything_v2",
36
+ "neck_hidden_sizes": [
37
+ 48,
38
+ 96,
39
+ 192,
40
+ 384
41
+ ],
42
+ "patch_size": 14,
43
+ "reassemble_factors": [
44
+ 4,
45
+ 2,
46
+ 1,
47
+ 0.5
48
+ ],
49
+ "reassemble_hidden_size": 384,
50
+ "torch_dtype": "float32",
51
+ "transformers_version": null,
52
+ "use_pretrained_backbone": false
53
+ }
depth_anything_v2_vits.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7c1a8c8cdd7885fb8391069cd1eee789126c8d896f7de6750499b1097f817ea
3
+ size 49595202
preprocessor_config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "do_pad": false,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "ensure_multiple_of": 14,
7
+ "image_mean": [
8
+ 0.485,
9
+ 0.456,
10
+ 0.406
11
+ ],
12
+ "image_processor_type": "DPTImageProcessor",
13
+ "image_std": [
14
+ 0.229,
15
+ 0.224,
16
+ 0.225
17
+ ],
18
+ "keep_aspect_ratio": true,
19
+ "resample": 3,
20
+ "rescale_factor": 0.00392156862745098,
21
+ "size": {
22
+ "height": 518,
23
+ "width": 518
24
+ },
25
+ "size_divisor": null
26
+ }