Intel
/

dpt-swinv2-base-384

Depth Estimation

Inference Endpoints

Model card Files Files and versions Community

dpt-swinv2-base-384 / README.md

nielsr's picture

nielsr HF staff

Create README.md

eeea639 6 months ago

|

raw history blame

No virus

2.12 kB

	---
	license: mit
	---

	# DPT 3.1 (Swinv2 backbone)

	DPT (Dense Prediction Transformer) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Ranftl et al. (2021) and first released in [this repository](https://github.com/isl-org/MiDaS/tree/master).

	Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face team.

	## Model description

	This DPT model uses the [Swinv2](https://huggingface.co/docs/transformers/model_doc/swinv2) model as backbone and adds a neck + head on top for monocular depth estimation.

	![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dpt_architecture.jpg)

	## How to use

	Here is how to use this model for zero-shot depth estimation on an image:

	```python
	from transformers import DPTImageProcessor, DPTForDepthEstimation
	import torch
	import numpy as np
	from PIL import Image
	import requests

	url = "http://images.cocodataset.org/val2017/000000039769.jpg"
	image = Image.open(requests.get(url, stream=True).raw)

	processor = DPTImageProcessor.from_pretrained("Intel/dpt-swinv2-base-384")
	model = DPTForDepthEstimation.from_pretrained("Intel/dpt-swinv2-base-384")

	# prepare image for the model
	inputs = processor(images=image, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	predicted_depth = outputs.predicted_depth

	# interpolate to original size
	prediction = torch.nn.functional.interpolate(
	predicted_depth.unsqueeze(1),
	size=image.size[::-1],
	mode="bicubic",
	align_corners=False,
	)

	# visualize the prediction
	output = prediction.squeeze().cpu().numpy()
	formatted = (output * 255 / np.max(output)).astype("uint8")
	depth = Image.fromarray(formatted)
	```

	or one can use the pipeline API:

	```python
	from transformers import pipeline

	pipe = pipeline(task="depth-estimation", model="Intel/dpt-swinv2-base-384")
	result = pipe("http://images.cocodataset.org/val2017/000000039769.jpg")
	result["depth"]
	```