nielsr HF staff commited on
Commit
e553fa2
1 Parent(s): f609b93

Update README.md (#2)

Browse files

- Update README.md (172c440f08454205a35ec8663aabd820bd741301)

Files changed (1) hide show
  1. README.md +3 -4
README.md CHANGED
@@ -5,15 +5,14 @@ license: mit
5
  # DPT 3.1 (BEiT backbone)
6
 
7
  DPT (Dense Prediction Transformer) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Ranftl et al. (2021) and first released in [this repository](https://github.com/isl-org/DPT).
8
- DPT uses the [BEiT](https://huggingface.co/docs/transformers/model_doc/beit) model as backbone and adds a neck + head on top for monocular depth estimation.
9
-
10
- ![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dpt_architecture.jpg)
11
 
12
  Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face team.
13
 
14
  ## Model description
15
 
16
- The Table Transformer is equivalent to [DETR](https://huggingface.co/docs/transformers/model_doc/detr), a Transformer-based object detection model. Note that the authors decided to use the "normalize before" setting of DETR, which means that layernorm is applied before self- and cross-attention.
 
 
17
 
18
  ## How to use
19
 
 
5
  # DPT 3.1 (BEiT backbone)
6
 
7
  DPT (Dense Prediction Transformer) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Ranftl et al. (2021) and first released in [this repository](https://github.com/isl-org/DPT).
 
 
 
8
 
9
  Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face team.
10
 
11
  ## Model description
12
 
13
+ This DPT model uses the [BEiT](https://huggingface.co/docs/transformers/model_doc/beit) model as backbone and adds a neck + head on top for monocular depth estimation.
14
+
15
+ ![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dpt_architecture.jpg)
16
 
17
  ## How to use
18