# **ZoeDepth: Combining relative and metric depth** (Official implementation) [![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/isl-org/ZoeDepth) [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/shariqfarooq/ZoeDepth) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) ![PyTorch](https://img.shields.io/badge/PyTorch_v1.10.1-EE4C2C?&logo=pytorch&logoColor=white) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zoedepth-zero-shot-transfer-by-combining/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=zoedepth-zero-shot-transfer-by-combining) >#### [ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth](https://arxiv.org/abs/2302.12288) > ##### [Shariq Farooq Bhat](https://shariqfarooq123.github.io), [Reiner Birkl](https://www.researchgate.net/profile/Reiner-Birkl), [Diana Wofk](https://dwofk.github.io/), [Peter Wonka](http://peterwonka.net/), [Matthias Müller](https://matthias.pw/) [[Paper]](https://arxiv.org/abs/2302.12288) ![teaser](assets/zoedepth-teaser.png) ## **Table of Contents** - [**Usage**](#usage) - [Using torch hub](#using-torch-hub) - [Using local copy](#using-local-copy) - [Using local torch hub](#using-local-torch-hub) - [or load the models manually](#or-load-the-models-manually) - [Using ZoeD models to predict depth](#using-zoed-models-to-predict-depth) - [**Environment setup**](#environment-setup) - [**Sanity checks** (Recommended)](#sanity-checks-recommended) - [Model files](#model-files) - [**Evaluation**](#evaluation) - [Evaluating offical models](#evaluating-offical-models) - [Evaluating local checkpoint](#evaluating-local-checkpoint) - [**Training**](#training) - [**Gradio demo**](#gradio-demo) - [**Citation**](#citation) ## **Usage** It is recommended to fetch the latest [MiDaS repo](https://github.com/isl-org/MiDaS) via torch hub before proceeding: ```python import torch torch.hub.help("intel-isl/MiDaS", "DPT_BEiT_L_384", force_reload=True) # Triggers fresh download of MiDaS repo ``` ### **ZoeDepth models** ### Using torch hub ```python import torch repo = "isl-org/ZoeDepth" # Zoe_N model_zoe_n = torch.hub.load(repo, "ZoeD_N", pretrained=True) # Zoe_K model_zoe_k = torch.hub.load(repo, "ZoeD_K", pretrained=True) # Zoe_NK model_zoe_nk = torch.hub.load(repo, "ZoeD_NK", pretrained=True) ``` ### Using local copy Clone this repo: ```bash git clone https://github.com/isl-org/ZoeDepth.git && cd ZoeDepth ``` #### Using local torch hub You can use local source for torch hub to load the ZoeDepth models, for example: ```python import torch # Zoe_N model_zoe_n = torch.hub.load(".", "ZoeD_N", source="local", pretrained=True) ``` #### or load the models manually ```python from zoedepth.models.builder import build_model from zoedepth.utils.config import get_config # ZoeD_N conf = get_config("zoedepth", "infer") model_zoe_n = build_model(conf) # ZoeD_K conf = get_config("zoedepth", "infer", config_version="kitti") model_zoe_k = build_model(conf) # ZoeD_NK conf = get_config("zoedepth_nk", "infer") model_zoe_nk = build_model(conf) ``` ### Using ZoeD models to predict depth ```python ##### sample prediction DEVICE = "cuda" if torch.cuda.is_available() else "cpu" zoe = model_zoe_n.to(DEVICE) # Local file from PIL import Image image = Image.open("/path/to/image.jpg").convert("RGB") # load depth_numpy = zoe.infer_pil(image) # as numpy depth_pil = zoe.infer_pil(image, output_type="pil") # as 16-bit PIL Image depth_tensor = zoe.infer_pil(image, output_type="tensor") # as torch tensor # Tensor from zoedepth.utils.misc import pil_to_batched_tensor X = pil_to_batched_tensor(image).to(DEVICE) depth_tensor = zoe.infer(X) # From URL from zoedepth.utils.misc import get_image_from_url # Example URL URL = "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS4W8H_Nxk_rs3Vje_zj6mglPOH7bnPhQitBH8WkqjlqQVotdtDEG37BsnGofME3_u6lDk&usqp=CAU" image = get_image_from_url(URL) # fetch depth = zoe.infer_pil(image) # Save raw from zoedepth.utils.misc import save_raw_16bit fpath = "/path/to/output.png" save_raw_16bit(depth, fpath) # Colorize output from zoedepth.utils.misc import colorize colored = colorize(depth) # save colored output fpath_colored = "/path/to/output_colored.png" Image.fromarray(colored).save(fpath_colored) ``` ## **Environment setup** The project depends on : - [pytorch](https://pytorch.org/) (Main framework) - [timm](https://timm.fast.ai/) (Backbone helper for MiDaS) - pillow, matplotlib, scipy, h5py, opencv (utilities) Install environment using `environment.yml` : Using [mamba](https://github.com/mamba-org/mamba) (fastest): ```bash mamba env create -n zoe --file environment.yml mamba activate zoe ``` Using conda : ```bash conda env create -n zoe --file environment.yml conda activate zoe ``` ## **Sanity checks** (Recommended) Check if models can be loaded: ```bash python sanity_hub.py ``` Try a demo prediction pipeline: ```bash python sanity.py ``` This will save a file `pred.png` in the root folder, showing RGB and corresponding predicted depth side-by-side. ## Model files Models are defined under `models/` folder, with `models/_.py` containing model definitions and `models/config_.json` containing configuration. Single metric head models (Zoe_N and Zoe_K from the paper) have the common definition and are defined under `models/zoedepth` while as the multi-headed model (Zoe_NK) is defined under `models/zoedepth_nk`. ## **Evaluation** Download the required dataset and change the `DATASETS_CONFIG` dictionary in `utils/config.py` accordingly. ### Evaluating offical models On NYU-Depth-v2 for example: For ZoeD_N: ```bash python evaluate.py -m zoedepth -d nyu ``` For ZoeD_NK: ```bash python evaluate.py -m zoedepth_nk -d nyu ``` ### Evaluating local checkpoint ```bash python evaluate.py -m zoedepth --pretrained_resource="local::/path/to/local/ckpt.pt" -d nyu ``` Pretrained resources are prefixed with `url::` to indicate weights should be fetched from a url, or `local::` to indicate path is a local file. Refer to `models/model_io.py` for details. The dataset name should match the corresponding key in `utils.config.DATASETS_CONFIG` . ## **Training** Download training datasets as per instructions given [here](https://github.com/cleinc/bts/tree/master/pytorch#nyu-depvh-v2). Then for training a single head model on NYU-Depth-v2 : ```bash python train_mono.py -m zoedepth --pretrained_resource="" ``` For training the Zoe-NK model: ```bash python train_mix.py -m zoedepth_nk --pretrained_resource="" ``` ## **Gradio demo** We provide a UI demo built using [gradio](https://gradio.app/). To get started, install UI requirements: ```bash pip install -r ui/ui_requirements.txt ``` Then launch the gradio UI: ```bash python -m ui.app ``` The UI is also hosted on HuggingFace🤗 [here](https://huggingface.co/spaces/shariqfarooq/ZoeDepth) ## **Citation** ``` @misc{https://doi.org/10.48550/arxiv.2302.12288, doi = {10.48550/ARXIV.2302.12288}, url = {https://arxiv.org/abs/2302.12288}, author = {Bhat, Shariq Farooq and Birkl, Reiner and Wofk, Diana and Wonka, Peter and Müller, Matthias}, keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth}, publisher = {arXiv}, year = {2023}, copyright = {arXiv.org perpetual, non-exclusive license} } ```