github-actions[bot] commited on
Commit
c549485
·
1 Parent(s): ad8d48f

Sync from GitHub: afccd22ead9934714bd8ff3d1eb163ab347878a3

Browse files
Files changed (2) hide show
  1. README.md +77 -37
  2. app.py +28 -17
README.md CHANGED
@@ -5,6 +5,7 @@ colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
  sdk_version: "4.44.0"
 
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
@@ -19,7 +20,7 @@ Optimized Python package for RGB-D depth refinement using Vision Transformer enc
19
  [![PyPI downloads](https://img.shields.io/pypi/dm/rgbd-depth.svg)](https://pypi.org/project/rgbd-depth/)
20
  [![Hugging Face Spaces](https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Aedelon/rgbd-depth)
21
  [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
22
- [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
23
  [![PyTorch 2.0+](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org/)
24
 
25
  ## 🎮 Try it Online
@@ -115,21 +116,38 @@ For production workflows or faster inference, use the local installation below.
115
 
116
  ### From PyPI (recommended)
117
 
 
118
  ```bash
119
- # Basic installation
120
  pip install rgbd-depth
 
121
 
122
- # With CUDA optimizations (xFormers)
 
 
123
  pip install rgbd-depth[xformers]
124
 
125
- # Development installation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  git clone https://github.com/Aedelon/rgbd-depth.git
127
  cd rgbd-depth
128
- pip install -e .
129
  ```
130
 
131
  **Requirements:**
132
- - Python 3.8+
133
  - PyTorch 2.0+ with appropriate CUDA/MPS support
134
  - OpenCV, NumPy, Pillow
135
 
@@ -264,18 +282,53 @@ We currently provide pre-trained models available for:
264
  ## File Structure
265
 
266
  ```
267
- cdm/
268
- ├── infer.py # Main inference script
269
- ├── setup.py # Package installation
270
- ├── rgbddepth/ # Core package
271
- ├── __init__.py
272
- ├── dpt.py # Main RGBDDepth model
273
- ├── dinov2.py # DINOv2 encoder
274
- ├── dinov2_layers/ # ViT transformer layers
275
- │ └── util/ # Utility functions
276
- ├── blocks.py # Neural network blocks
277
- │ └── transform.py # Image preprocessing
278
- └── README.md
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
279
  ```
280
 
281
  ## Performance
@@ -292,25 +345,12 @@ CDMs achieve state-of-the-art performance on metric depth estimation:
292
  - Zero-shot generalization across different camera types
293
  - Real-time inference suitable for robot control (lightweight ViT variants)
294
 
295
- ### Speed Benchmarks
296
-
297
- | Device | Mode | Precision | Time | vs Baseline | Notes |
298
- |--------|------|-----------|------|-------------|-------|
299
- | **CUDA** | Vanilla | FP32 | TBD | - | Reference |
300
- | **CUDA** | Optimized (xFormers) | FP32 | TBD | ~8% faster | Recommended |
301
- | **CUDA** | Optimized | FP16 | TBD | ~2× faster | Best speed |
302
- | **CUDA** | Optimized | BF16 | TBD | ~2× faster | Best stability |
303
- | **MPS** | Vanilla | FP32 | 1.34s | - | torch.compile: no gain |
304
- | **MPS** | Vanilla | FP16 | TBD | TBD | To be benchmarked |
305
- | **CPU** | Vanilla | FP32 | 13.37s | - | Optimizations: -11% slower |
306
-
307
- **Notes:**
308
- - **CUDA**: Optimizations auto-enabled by default (use `--no-optimize` to disable)
309
- - **MPS**: torch.compile provides no gain for Vision Transformers (~0% improvement)
310
- - **CPU**: torch.compile is counterproductive (compilation overhead > gains)
311
- - xFormers is CUDA-only (~8% faster than native SDPA)
312
-
313
- For detailed optimization strategies, see [OPTIMIZATION.md](OPTIMIZATION.md).
314
 
315
  ## What's Different from Reference?
316
 
 
5
  colorTo: purple
6
  sdk: gradio
7
  sdk_version: "4.44.0"
8
+ python_version: "3.10"
9
  app_file: app.py
10
  pinned: false
11
  license: apache-2.0
 
20
  [![PyPI downloads](https://img.shields.io/pypi/dm/rgbd-depth.svg)](https://pypi.org/project/rgbd-depth/)
21
  [![Hugging Face Spaces](https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Aedelon/rgbd-depth)
22
  [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
23
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
24
  [![PyTorch 2.0+](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org/)
25
 
26
  ## 🎮 Try it Online
 
116
 
117
  ### From PyPI (recommended)
118
 
119
+ **Basic installation (core dependencies only):**
120
  ```bash
 
121
  pip install rgbd-depth
122
+ ```
123
 
124
+ **Installation with extras:**
125
+ ```bash
126
+ # With CUDA optimizations (xFormers, ~8% faster)
127
  pip install rgbd-depth[xformers]
128
 
129
+ # With Gradio demo interface
130
+ pip install rgbd-depth[demo]
131
+
132
+ # With HuggingFace Hub model downloads
133
+ pip install rgbd-depth[download]
134
+
135
+ # With development tools (pytest, black, ruff, etc.)
136
+ pip install rgbd-depth[dev]
137
+
138
+ # Install everything (all extras)
139
+ pip install rgbd-depth[all]
140
+ ```
141
+
142
+ **Development installation (editable):**
143
+ ```bash
144
  git clone https://github.com/Aedelon/rgbd-depth.git
145
  cd rgbd-depth
146
+ pip install -e ".[dev]" # or uv sync --extra dev
147
  ```
148
 
149
  **Requirements:**
150
+ - Python 3.10+ (Python 3.8-3.9 support dropped in v1.0.2+)
151
  - PyTorch 2.0+ with appropriate CUDA/MPS support
152
  - OpenCV, NumPy, Pillow
153
 
 
282
  ## File Structure
283
 
284
  ```
285
+ rgbd-depth/
286
+ ├── app.py # Gradio web demo for HuggingFace Spaces
287
+ ├── infer.py # CLI inference script (main entry point)
288
+ ├── pyproject.toml # Modern package config (PEP 621, replaces setup.py)
289
+ ├── setup.py # Legacy setuptools build script
290
+ ├── requirements.txt # Minimal deps for HuggingFace Spaces
291
+ ├── uv.lock # UV package manager lock file
292
+ ├── LICENSE # Apache 2.0 license
293
+ ├── README.md # This file (GitHub/PyPI/HF Spaces unified)
294
+ ├── OPTIMIZATION.md # Performance benchmarks and optimization guide
295
+ ├── CHANGELOG.md # Version history and release notes
296
+ └── VIRAL_STRATEGY.md # GitHub/PyPI marketing strategy
297
+
298
+ ├── rgbddepth/ # Main Python package
299
+ │ ├── __init__.py # Public API exports (RGBDDepth, DinoVisionTransformer, __version__)
300
+ │ ├── dpt.py # RGBDDepth model (dual-branch ViT + DPT decoder)
301
+ │ ├── dinov2.py # DINOv2 Vision Transformer encoder
302
+ │ ├── flexible_attention.py # Cross-attention w/ xFormers + SDPA fallback
303
+ │ │
304
+ │ ├── dinov2_layers/ # Vision Transformer building blocks (from Meta DINOv2)
305
+ │ │ ├── __init__.py
306
+ │ │ ├── attention.py # Self-attention w/ optional xFormers (MemEffAttention)
307
+ │ │ ├── block.py # Transformer encoder block (NestedTensorBlock)
308
+ │ │ ├── mlp.py # Feed-forward network (Mlp)
309
+ │ │ ├── patch_embed.py # Image → patch embeddings (PatchEmbed)
310
+ │ │ ├── swiglu_ffn.py # SwiGLU activation FFN
311
+ │ │ ├── drop_path.py # Stochastic depth regularization
312
+ │ │ └── layer_scale.py # LayerScale normalization
313
+ │ │
314
+ │ └── util/ # Utilities
315
+ │ ├── __init__.py
316
+ │ ├── blocks.py # DPT decoder blocks (FeatureFusionBlock, ResidualConvUnit)
317
+ │ └── transform.py # Image preprocessing (Resize, PrepareForNet)
318
+
319
+ ├── tests/ # Test suite (42 tests, runs in GitHub Actions)
320
+ │ ├── test_import.py # Basic imports and smoke tests
321
+ │ └── test_model.py # Architecture, forward pass, attention, preprocessing
322
+
323
+ ├── example_data/ # Example RGB-D pairs for testing
324
+ │ ├── color_12.png # RGB image sample
325
+ │ ├── depth_12.png # Depth map sample
326
+ │ └── result.png # Expected output
327
+
328
+ └── .github/workflows/ # CI/CD automation
329
+ ├── test.yml # Run tests on Python 3.10-3.12 (Ubuntu/macOS/Windows)
330
+ ├── publish.yml # Auto-publish to PyPI on release tags
331
+ └── deploy-hf.yml # Auto-deploy to HuggingFace Spaces on push to main
332
  ```
333
 
334
  ## Performance
 
345
  - Zero-shot generalization across different camera types
346
  - Real-time inference suitable for robot control (lightweight ViT variants)
347
 
348
+ **Performance optimizations:**
349
+ - xFormers support on CUDA (~8% faster than native SDPA)
350
+ - Mixed precision (FP16/BF16) for faster inference
351
+ - Device-specific optimizations (CUDA/MPS/CPU)
352
+
353
+ For detailed optimization strategies and benchmarks, see [OPTIMIZATION.md](OPTIMIZATION.md).
 
 
 
 
 
 
 
 
 
 
 
 
 
354
 
355
  ## What's Different from Reference?
356
 
app.py CHANGED
@@ -4,6 +4,7 @@
4
 
5
  """Gradio demo for rgbd-depth on Hugging Face Spaces."""
6
 
 
7
  from pathlib import Path
8
 
9
  import gradio as gr
@@ -13,6 +14,14 @@ from PIL import Image
13
 
14
  from rgbddepth import RGBDDepth
15
 
 
 
 
 
 
 
 
 
16
  # Global model cache
17
  MODELS = {}
18
 
@@ -35,16 +44,16 @@ def download_model(camera_model: str = DEFAULT_MODEL):
35
  from huggingface_hub import hf_hub_download
36
 
37
  repo_id, filename = HF_MODELS.get(camera_model, HF_MODELS[DEFAULT_MODEL])
38
- print(f"📥 Downloading {camera_model} model from {repo_id}/{filename}...")
39
 
40
  # Download the checkpoint
41
  checkpoint_path = hf_hub_download(repo_id=repo_id, filename=filename, cache_dir=".cache")
42
 
43
- print(f"Downloaded to {checkpoint_path}")
44
  return checkpoint_path
45
 
46
  except Exception as e:
47
- print(f"Failed to download model: {e}")
48
  return None
49
 
50
 
@@ -70,7 +79,7 @@ def load_model(camera_model: str = DEFAULT_MODEL, use_xformers: bool = False):
70
  local_path = Path(f"checkpoints/{camera_model}.pt")
71
  if local_path.exists():
72
  checkpoint_path = str(local_path)
73
- print(f"Using local checkpoint: {checkpoint_path}")
74
  else:
75
  # 2. Download from HuggingFace
76
  checkpoint_path = download_model(camera_model)
@@ -87,11 +96,13 @@ def load_model(camera_model: str = DEFAULT_MODEL, use_xformers: bool = False):
87
  states = checkpoint
88
 
89
  model.load_state_dict(states, strict=False)
90
- print(f"Loaded checkpoint for {camera_model}")
91
  except Exception as e:
92
- print(f"Failed to load checkpoint: {e}, using random weights")
93
  else:
94
- print(f"⚠ No checkpoint available for {camera_model}, using random weights (demo only)")
 
 
95
 
96
  # Move to GPU if available (CUDA or MPS for macOS)
97
  if torch.cuda.is_available():
@@ -167,13 +178,13 @@ def process_depth(
167
  else:
168
  dtype = None # FP32
169
 
170
- # DEBUG: Print input stats
171
- print(f"[DEBUG] depth_image raw: min={depth_image.min():.1f}, max={depth_image.max():.1f}")
172
- print(
173
- f"[DEBUG] depth_normalized: min={depth_normalized[depth_normalized>0].min():.4f}, max={depth_normalized.max():.4f}"
174
  )
175
- print(
176
- f"[DEBUG] simi_depth: min={simi_depth[simi_depth>0].min():.4f}, max={simi_depth.max():.4f}"
177
  )
178
 
179
  # Run inference
@@ -184,14 +195,14 @@ def process_depth(
184
  else:
185
  pred = model.infer_image(rgb_image, simi_depth, input_size=input_size)
186
 
187
- # DEBUG: Print prediction stats before reconversion
188
- print(f"[DEBUG] pred (inverse depth): min={pred[pred>0].min():.4f}, max={pred.max():.4f}")
189
 
190
  # Convert from inverse depth to depth
191
  pred = np.where(pred > 1e-8, 1.0 / pred, 0.0)
192
 
193
- # DEBUG: Print final depth stats
194
- print(f"[DEBUG] pred (depth): min={pred[pred>0].min():.4f}, max={pred.max():.4f}")
195
 
196
  # Colorize for visualization
197
  try:
 
4
 
5
  """Gradio demo for rgbd-depth on Hugging Face Spaces."""
6
 
7
+ import logging
8
  from pathlib import Path
9
 
10
  import gradio as gr
 
14
 
15
  from rgbddepth import RGBDDepth
16
 
17
+ # Configure logging
18
+ logging.basicConfig(
19
+ level=logging.INFO,
20
+ format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
21
+ datefmt="%H:%M:%S",
22
+ )
23
+ logger = logging.getLogger(__name__)
24
+
25
  # Global model cache
26
  MODELS = {}
27
 
 
44
  from huggingface_hub import hf_hub_download
45
 
46
  repo_id, filename = HF_MODELS.get(camera_model, HF_MODELS[DEFAULT_MODEL])
47
+ logger.info(f"Downloading {camera_model} model from {repo_id}/{filename}...")
48
 
49
  # Download the checkpoint
50
  checkpoint_path = hf_hub_download(repo_id=repo_id, filename=filename, cache_dir=".cache")
51
 
52
+ logger.info(f"Downloaded to {checkpoint_path}")
53
  return checkpoint_path
54
 
55
  except Exception as e:
56
+ logger.error(f"Failed to download model: {e}")
57
  return None
58
 
59
 
 
79
  local_path = Path(f"checkpoints/{camera_model}.pt")
80
  if local_path.exists():
81
  checkpoint_path = str(local_path)
82
+ logger.info(f"Using local checkpoint: {checkpoint_path}")
83
  else:
84
  # 2. Download from HuggingFace
85
  checkpoint_path = download_model(camera_model)
 
96
  states = checkpoint
97
 
98
  model.load_state_dict(states, strict=False)
99
+ logger.info(f"Loaded checkpoint for {camera_model}")
100
  except Exception as e:
101
+ logger.warning(f"Failed to load checkpoint: {e}, using random weights")
102
  else:
103
+ logger.warning(
104
+ f"No checkpoint available for {camera_model}, using random weights (demo only)"
105
+ )
106
 
107
  # Move to GPU if available (CUDA or MPS for macOS)
108
  if torch.cuda.is_available():
 
178
  else:
179
  dtype = None # FP32
180
 
181
+ # Log input statistics
182
+ logger.debug(f"depth_image raw: min={depth_image.min():.1f}, max={depth_image.max():.1f}")
183
+ logger.debug(
184
+ f"depth_normalized: min={depth_normalized[depth_normalized>0].min():.4f}, max={depth_normalized.max():.4f}"
185
  )
186
+ logger.debug(
187
+ f"simi_depth: min={simi_depth[simi_depth>0].min():.4f}, max={simi_depth.max():.4f}"
188
  )
189
 
190
  # Run inference
 
195
  else:
196
  pred = model.infer_image(rgb_image, simi_depth, input_size=input_size)
197
 
198
+ # Log prediction statistics
199
+ logger.debug(f"pred (inverse depth): min={pred[pred>0].min():.4f}, max={pred.max():.4f}")
200
 
201
  # Convert from inverse depth to depth
202
  pred = np.where(pred > 1e-8, 1.0 / pred, 0.0)
203
 
204
+ # Log final depth statistics
205
+ logger.debug(f"pred (depth): min={pred[pred>0].min():.4f}, max={pred.max():.4f}")
206
 
207
  # Colorize for visualization
208
  try: