Clement commited on
Commit
7be3023
1 Parent(s): 3437407
Files changed (4) hide show
  1. README.md +98 -0
  2. config.json +57 -0
  3. model.safetensors +3 -0
  4. preprocessor_config.json +18 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - vision
5
+ pipeline_tag: depth-estimation
6
+ widget:
7
+ - inference: false
8
+ ---
9
+
10
+ # Depth Anything (large-sized model, Transformers version)
11
+
12
+ Depth Anything model. It was introduced in the paper [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://arxiv.org/abs/2401.10891) by Lihe Yang et al. and first released in [this repository](https://github.com/LiheYoung/Depth-Anything).
13
+
14
+ [Online demo](https://huggingface.co/spaces/LiheYoung/Depth-Anything) is also provided.
15
+
16
+ Disclaimer: The team releasing Depth Anything did not write a model card for this model so this model card has been written by the Hugging Face team.
17
+
18
+ ## Model description
19
+
20
+ Depth Anything leverages the [DPT](https://huggingface.co/docs/transformers/model_doc/dpt) architecture with a [DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2) backbone.
21
+
22
+ The model is trained on ~62 million images, obtaining state-of-the-art results for both relative and absolute depth estimation.
23
+
24
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/depth_anything_overview.jpg"
25
+ alt="drawing" width="600"/>
26
+
27
+ <small> Depth Anything overview. Taken from the <a href="https://arxiv.org/abs/2401.10891">original paper</a>.</small>
28
+
29
+ ## Intended uses & limitations
30
+
31
+ You can use the raw model for tasks like zero-shot depth estimation. See the [model hub](https://huggingface.co/models?search=depth-anything) to look for
32
+ other versions on a task that interests you.
33
+
34
+ ### How to use
35
+
36
+ Here is how to use this model to perform zero-shot depth estimation:
37
+
38
+ ```python
39
+ from transformers import pipeline
40
+ from PIL import Image
41
+ import requests
42
+
43
+ # load pipe
44
+ pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-large-hf")
45
+
46
+ # load image
47
+ url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
48
+ image = Image.open(requests.get(url, stream=True).raw)
49
+
50
+ # inference
51
+ depth = pipe(image)["depth"]
52
+ ```
53
+
54
+ Alternatively, one can use the classes themselves:
55
+
56
+ ```python
57
+ from transformers import AutoImageProcessor, AutoModelForDepthEstimation
58
+ import torch
59
+ import numpy as np
60
+ from PIL import Image
61
+ import requests
62
+
63
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
64
+ image = Image.open(requests.get(url, stream=True).raw)
65
+
66
+ image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-large-hf")
67
+ model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-large-hf")
68
+
69
+ # prepare image for the model
70
+ inputs = image_processor(images=image, return_tensors="pt")
71
+
72
+ with torch.no_grad():
73
+ outputs = model(**inputs)
74
+ predicted_depth = outputs.predicted_depth
75
+
76
+ # interpolate to original size
77
+ prediction = torch.nn.functional.interpolate(
78
+ predicted_depth.unsqueeze(1),
79
+ size=image.size[::-1],
80
+ mode="bicubic",
81
+ align_corners=False,
82
+ )
83
+ ```
84
+
85
+ For more code examples, we refer to the [documentation](https://huggingface.co/transformers/main/model_doc/depth_anything.html#).
86
+
87
+ ### BibTeX entry and citation info
88
+
89
+ ```bibtex
90
+ @misc{yang2024depth,
91
+ title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
92
+ author={Lihe Yang and Bingyi Kang and Zilong Huang and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
93
+ year={2024},
94
+ eprint={2401.10891},
95
+ archivePrefix={arXiv},
96
+ primaryClass={cs.CV}
97
+ }
98
+ ```
config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_commit_hash": null,
3
+ "architectures": ["DepthAnythingForDepthEstimation"],
4
+ "backbone": null,
5
+ "backbone_config": {
6
+ "architectures": ["Dinov2Model"],
7
+ "hidden_size": 1024,
8
+ "image_size": 518,
9
+ "model_type": "dinov2",
10
+ "num_attention_heads": 16,
11
+ "num_hidden_layers": 24,
12
+ "out_features": ["stage21", "stage22", "stage23", "stage24"],
13
+ "out_indices": [21, 22, 23, 24],
14
+ "patch_size": 14,
15
+ "reshape_hidden_states": false,
16
+ "stage_names": [
17
+ "stem",
18
+ "stage1",
19
+ "stage2",
20
+ "stage3",
21
+ "stage4",
22
+ "stage5",
23
+ "stage6",
24
+ "stage7",
25
+ "stage8",
26
+ "stage9",
27
+ "stage10",
28
+ "stage11",
29
+ "stage12",
30
+ "stage13",
31
+ "stage14",
32
+ "stage15",
33
+ "stage16",
34
+ "stage17",
35
+ "stage18",
36
+ "stage19",
37
+ "stage20",
38
+ "stage21",
39
+ "stage22",
40
+ "stage23",
41
+ "stage24"
42
+ ],
43
+ "torch_dtype": "float32"
44
+ },
45
+ "fusion_hidden_size": 256,
46
+ "head_hidden_size": 32,
47
+ "head_in_index": -1,
48
+ "initializer_range": 0.02,
49
+ "model_type": "depth_anything",
50
+ "neck_hidden_sizes": [256, 512, 1024, 1024],
51
+ "patch_size": 14,
52
+ "reassemble_factors": [4, 2, 1, 0.5],
53
+ "reassemble_hidden_size": 1024,
54
+ "torch_dtype": "float32",
55
+ "transformers_version": null,
56
+ "use_pretrained_backbone": false
57
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc27360a3e6906e5ddd8f618e2dcde11362327361918b8f76793e42e25de31b3
3
+ size 1341322868
preprocessor_config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "do_pad": false,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "ensure_multiple_of": 14,
7
+ "image_mean": [0.485, 0.456, 0.406],
8
+ "image_processor_type": "DPTImageProcessor",
9
+ "image_std": [0.229, 0.224, 0.225],
10
+ "keep_aspect_ratio": true,
11
+ "resample": 3,
12
+ "rescale_factor": 0.00392156862745098,
13
+ "size": {
14
+ "height": 518,
15
+ "width": 518
16
+ },
17
+ "size_divisor": null
18
+ }