google
/

tipsv2-b14-dpt

Depth Estimation

feature-extraction

surface-normals

semantic-segmentation

dense-prediction

Model card Files Files and versions

gberton commited on 7 days ago

Commit

8301d59

·

verified ·

1 Parent(s): c8f35b6

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+---
+license: apache-2.0
+tags:
+- vision
+- depth-estimation
+- surface-normals
+- semantic-segmentation
+- dense-prediction
+library_name: transformers
+pipeline_tag: depth-estimation
+---
+# TIPSv2 — B/14 DPT Heads
+DPT (Dense Prediction Transformer) heads for depth estimation, surface normal prediction, and semantic segmentation (ADE20K, 150 classes) on top of the [TIPSv2 B/14](https://huggingface.co/google/tipsv2-b14) backbone. The backbone is loaded automatically.
+## Usage
+```bash
+pip install transformers torch torchvision sentencepiece
+```
+```python
+from transformers import AutoModel
+from torchvision import transforms
+from PIL import Image
+model = AutoModel.from_pretrained("google/tipsv2-b14-dpt", trust_remote_code=True)
+model.eval().cuda()
+transform = transforms.Compose([transforms.Resize((448, 448)), transforms.ToTensor()])
+pixel_values = transform(Image.open("photo.jpg")).unsqueeze(0).cuda()
+# All tasks at once
+outputs = model(pixel_values)
+outputs.depth          # (B, 1, H, W)
+outputs.normals        # (B, 3, H, W)
+outputs.segmentation   # (B, 150, H, W)
+# Or individual tasks (only runs the requested head)
+depth = model.predict_depth(pixel_values)
+normals = model.predict_normals(pixel_values)
+seg = model.predict_segmentation(pixel_values)
+```
+## Model details
+- **Backbone**: [TIPSv2 B/14](google/tipsv2-b14) (loaded automatically)
+- **Heads**: ~72M total params (depth + normals + segmentation)
+- **Segmentation**: ADE20K, 150 classes
+- **Input**: images in `[0, 1]` range, any resolution (multiples of 14 recommended)
+## License
+Apache 2.0