nielsr HF staff commited on
Commit
f609b93
1 Parent(s): 41aa48b

Create README.md (#1)

Browse files

- Create README.md (504d9b1853f3ebfb40f16f2cf1e7178482969c47)

Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # DPT 3.1 (BEiT backbone)
6
+
7
+ DPT (Dense Prediction Transformer) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Ranftl et al. (2021) and first released in [this repository](https://github.com/isl-org/DPT).
8
+ DPT uses the [BEiT](https://huggingface.co/docs/transformers/model_doc/beit) model as backbone and adds a neck + head on top for monocular depth estimation.
9
+
10
+ ![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dpt_architecture.jpg)
11
+
12
+ Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face team.
13
+
14
+ ## Model description
15
+
16
+ The Table Transformer is equivalent to [DETR](https://huggingface.co/docs/transformers/model_doc/detr), a Transformer-based object detection model. Note that the authors decided to use the "normalize before" setting of DETR, which means that layernorm is applied before self- and cross-attention.
17
+
18
+ ## How to use
19
+
20
+ Here is how to use this model for zero-shot depth estimation on an image:
21
+
22
+ ```python
23
+ from transformers import DPTImageProcessor, DPTForDepthEstimation
24
+ import torch
25
+ import numpy as np
26
+ from PIL import Image
27
+ import requests
28
+
29
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
30
+ image = Image.open(requests.get(url, stream=True).raw)
31
+
32
+ processor = DPTImageProcessor.from_pretrained("Intel/dpt-large")
33
+ model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large")
34
+
35
+ # prepare image for the model
36
+ inputs = processor(images=image, return_tensors="pt")
37
+
38
+ with torch.no_grad():
39
+ outputs = model(**inputs)
40
+ predicted_depth = outputs.predicted_depth
41
+
42
+ # interpolate to original size
43
+ prediction = torch.nn.functional.interpolate(
44
+ predicted_depth.unsqueeze(1),
45
+ size=image.size[::-1],
46
+ mode="bicubic",
47
+ align_corners=False,
48
+ )
49
+
50
+ # visualize the prediction
51
+ output = prediction.squeeze().cpu().numpy()
52
+ formatted = (output * 255 / np.max(output)).astype("uint8")
53
+ depth = Image.fromarray(formatted)
54
+ ```
55
+
56
+ or one can use the pipeline API:
57
+
58
+ ```python
59
+ from transformers import pipeline
60
+
61
+ pipe = pipeline(task="depth-estimation", model="Intel/dpt-beit-base-384")
62
+ result = pipe("http://images.cocodataset.org/val2017/000000039769.jpg")
63
+ result["depth"]
64
+ ```