Add model card

This PR adds a model card for UniScene3D, including links to the paper, project page, and GitHub repository. It also adds the `image-feature-extraction` pipeline tag to improve discoverability.

Files changed (1) hide show

README.md +27 -0

README.md ADDED Viewed

	@@ -0,0 +1,27 @@

+---
+pipeline_tag: image-feature-extraction
+---
+# UniScene3D: Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding
+UniScene3D is a transformer-based encoder that learns unified scene representations from multi-view colored pointmaps, jointly modeling image appearance and geometry. It extends pretrained CLIP models to learn representations that effectively combine complementary information from images and pointmaps, generalizing across diverse 3D scene understanding tasks.
+- **Project Page:** [https://yebulabula.github.io/UniScene3D/](https://yebulabula.github.io/UniScene3D/)
+- **GitHub Repository:** [https://github.com/Yebulabula/UniScene3D](https://github.com/Yebulabula/UniScene3D)
+- **Paper:** [https://huggingface.co/papers/2604.02546](https://huggingface.co/papers/2604.02546)
+## Key Features
+- **Unified Representation:** Jointly encodes geometry and appearance from multi-view colored pointmaps within a single ViT encoder.
+- **Novel Training Objectives:** Introduces cross-view geometric alignment and grounded view alignment to enforce geometric and semantic consistency.
+- **Versatile Performance:** Demonstrates state-of-the-art performance in zero-shot, few-shot, and task-specific fine-tuning settings for tasks like viewpoint grounding, scene retrieval, and 3D VQA.
+## Citation
+If you find this work useful, please cite:
+```bibtex
+@inproceedings{mao2026uniscene3d,
+  title     = {Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding},
+  author    = {Mao, Ye and Luo, Weixun and Huang, Ranran and Jing, Junpeng and Mikolajczyk, Krystian},
+  booktitle = {arxiv},
+  year      = {2026}
+}
+```