Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,98 +1,91 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
|
|
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
|
| 12 |
-
* **Paper:** [arXiv:2509.26618](http://arxiv.org/abs/2509.26618)
|
| 13 |
-
* **Project Page:** [depth-any-in-any-dir.github.io](https://depth-any-in-any-dir.github.io/)
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
```bibtex
|
| 18 |
-
@article{
|
| 19 |
-
title={
|
| 20 |
author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
|
| 21 |
journal={arXiv preprint arXiv:2509.26618},
|
| 22 |
year={2025}
|
| 23 |
}
|
| 24 |
```
|
| 25 |
-
|
| 26 |
-
## 🚀 WebGPU Demo
|
| 27 |
-
|
| 28 |
-
This project includes a web-based demo that runs the model directly in your browser.
|
| 29 |
-
|
| 30 |
-
### Prerequisites
|
| 31 |
-
|
| 32 |
-
* **Python 3.10+** (for model export)
|
| 33 |
-
* **Web Browser** with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly).
|
| 34 |
-
|
| 35 |
-
### Installation
|
| 36 |
-
|
| 37 |
-
1. **Clone the repository:**
|
| 38 |
-
```bash
|
| 39 |
-
git clone <your-repo-url>
|
| 40 |
-
cd DA-2-Web
|
| 41 |
-
```
|
| 42 |
-
|
| 43 |
-
2. **Set up Python environment:**
|
| 44 |
-
```bash
|
| 45 |
-
python3 -m venv venv
|
| 46 |
-
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 47 |
-
pip install -r requirements.txt
|
| 48 |
-
```
|
| 49 |
-
|
| 50 |
-
### Model Preparation
|
| 51 |
-
|
| 52 |
-
To run the demo, you first need to convert the PyTorch model to ONNX format.
|
| 53 |
-
|
| 54 |
-
1. **Download the model weights:**
|
| 55 |
-
Download `model.safetensors` from the [HuggingFace repository](https://huggingface.co/haodongli/DA-2) and place it in the root directory of this project.
|
| 56 |
-
|
| 57 |
-
2. **Export to ONNX:**
|
| 58 |
-
Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing `clamp` with `max`/`min`).
|
| 59 |
-
```bash
|
| 60 |
-
python export_onnx.py
|
| 61 |
-
```
|
| 62 |
-
This will generate `da2_model.onnx`.
|
| 63 |
-
|
| 64 |
-
3. **Merge ONNX files:**
|
| 65 |
-
The export process might generate external data files. Use the merge script to create a single `.onnx` file for easier web loading.
|
| 66 |
-
```bash
|
| 67 |
-
python merge_onnx.py
|
| 68 |
-
```
|
| 69 |
-
This will generate `da2_model_single.onnx`.
|
| 70 |
-
|
| 71 |
-
### Running the Demo
|
| 72 |
-
|
| 73 |
-
1. **Start a local web server:**
|
| 74 |
-
You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context.
|
| 75 |
-
```bash
|
| 76 |
-
python3 -m http.server 8000
|
| 77 |
-
```
|
| 78 |
-
|
| 79 |
-
2. **Open in Browser:**
|
| 80 |
-
Navigate to `http://localhost:8000/web/` in your WebGPU-compatible browser.
|
| 81 |
-
|
| 82 |
-
3. **Usage:**
|
| 83 |
-
* Click "Choose File" to upload a panoramic image.
|
| 84 |
-
* Click "Run Inference" to generate the depth map.
|
| 85 |
-
* The process runs entirely locally on your GPU.
|
| 86 |
-
|
| 87 |
-
## 🛠️ Technical Details of the Port
|
| 88 |
-
|
| 89 |
-
* **Precision:** The model was converted to **FP16 (Half Precision)** to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs.
|
| 90 |
-
* **Opset:** Exported using **ONNX Opset 17**.
|
| 91 |
-
* **Modifications:**
|
| 92 |
-
* The `SphereViT` and `ViT_w_Esphere` modules were modified to ensure strict FP16 compatibility.
|
| 93 |
-
* `torch.clamp` operations were replaced with `torch.max` and `torch.min` combinations to avoid `Clip` operator issues in `onnxruntime-web` when handling mixed scalar/tensor inputs.
|
| 94 |
-
* Sphere embeddings are pre-calculated and cast to FP16 within the model graph.
|
| 95 |
-
|
| 96 |
-
## 📄 License
|
| 97 |
-
|
| 98 |
-
This project follows the license of the original [DA-2 repository](https://github.com/EnVision-Research/DA-2). Please refer to the original repository for license details.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: onnx
|
| 4 |
+
tags:
|
| 5 |
+
- depth-estimation
|
| 6 |
+
- panoramic
|
| 7 |
+
- 360-degree
|
| 8 |
+
- webgpu
|
| 9 |
+
- onnx
|
| 10 |
+
pipeline_tag: depth-estimation
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# DA-2: Depth Anything in Any Direction (ONNX WebGPU Version)
|
| 14 |
+
|
| 15 |
+
This repository contains the **ONNX** weights for [DA-2: Depth Anything in Any Direction](https://github.com/EnVision-Research/DA-2), optimized for **WebGPU** inference in the browser.
|
| 16 |
+
|
| 17 |
+
## Model Details
|
| 18 |
+
|
| 19 |
+
- **Original Model:** [haodongli/DA-2](https://huggingface.co/haodongli/DA-2)
|
| 20 |
+
- **Framework:** ONNX (Opset 17)
|
| 21 |
+
- **Precision:** FP32 (Full Precision)
|
| 22 |
+
- **Input Resolution:** 1092x546
|
| 23 |
+
- **Size:** ~1.4 GB
|
| 24 |
+
|
| 25 |
+
## Conversion Details
|
| 26 |
+
|
| 27 |
+
This model was converted from the original PyTorch weights to ONNX to enable client-side inference using `onnxruntime-web`.
|
| 28 |
+
|
| 29 |
+
- **Optimization:** Constant folding applied.
|
| 30 |
+
- **Compatibility:** Verified with WebGPU backend.
|
| 31 |
+
- **Modifications:**
|
| 32 |
+
- Replaced `clamp` operators with `Max`/`Min` combinations to ensure WebGPU kernel compatibility.
|
| 33 |
+
- Removed internal normalization layers to allow raw 0-1 input from the browser.
|
| 34 |
+
|
| 35 |
+
## Usage (Transformers.js)
|
| 36 |
+
|
| 37 |
+
You can also run this model using [Transformers.js](https://huggingface.co/docs/transformers.js).
|
| 38 |
+
|
| 39 |
+
```javascript
|
| 40 |
+
import { pipeline } from '@xenova/transformers';
|
| 41 |
+
|
| 42 |
+
// Initialize the pipeline
|
| 43 |
+
const depth_estimator = await pipeline('depth-estimation', 'phiph/DA-2-WebGPU', {
|
| 44 |
+
device: 'webgpu',
|
| 45 |
+
dtype: 'fp32', // Use FP32 as exported
|
| 46 |
+
});
|
| 47 |
+
|
| 48 |
+
// Run inference
|
| 49 |
+
const url = 'path/to/your/panorama.jpg';
|
| 50 |
+
const output = await depth_estimator(url);
|
| 51 |
+
// output.depth is the raw tensor
|
| 52 |
+
// output.mask is the visualized depth map
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
## Usage (ONNX Runtime Web)
|
| 56 |
|
| 57 |
+
You can run this model in the browser using `onnxruntime-web`.
|
| 58 |
|
| 59 |
+
```javascript
|
| 60 |
+
import * as ort from 'onnxruntime-web/webgpu';
|
| 61 |
|
| 62 |
+
// 1. Initialize Session
|
| 63 |
+
// Note: Model is now in the 'onnx' subdirectory
|
| 64 |
+
const session = await ort.InferenceSession.create('https://huggingface.co/phiph/DA-2-WebGPU/resolve/main/onnx/model.onnx', {
|
| 65 |
+
executionProviders: ['webgpu'],
|
| 66 |
+
preferredOutputLocation: { last_hidden_state: 'gpu-buffer' }
|
| 67 |
+
});
|
| 68 |
|
| 69 |
+
// 2. Prepare Input (Float32, 0-1 range, NCHW)
|
| 70 |
+
// Note: Do NOT apply ImageNet mean/std normalization. The model expects raw 0-1 floats.
|
| 71 |
+
const tensor = new ort.Tensor('float32', float32Data, [1, 3, 546, 1092]);
|
| 72 |
+
|
| 73 |
+
// 3. Run Inference
|
| 74 |
+
const results = await session.run({ images: tensor });
|
| 75 |
+
const depthMap = results.depth; // Access output
|
| 76 |
+
```
|
| 77 |
|
| 78 |
+
## License
|
|
|
|
|
|
|
| 79 |
|
| 80 |
+
This model is a derivative work of [DA-2](https://github.com/EnVision-Research/DA-2) and is distributed under the **Apache License 2.0**.
|
| 81 |
+
|
| 82 |
+
Please cite the original authors if you use this model:
|
| 83 |
|
| 84 |
```bibtex
|
| 85 |
+
@article{li2025depth,
|
| 86 |
+
title={DA$^{2}$: Depth Anything in Any Direction},
|
| 87 |
author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
|
| 88 |
journal={arXiv preprint arXiv:2509.26618},
|
| 89 |
year={2025}
|
| 90 |
}
|
| 91 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|