RavenK
/

TAC-ViT-base-rgb

Feature Extraction

clip_vision_model

Inference Endpoints

Model card Files Files and versions Community

RavenK commited on Nov 22, 2023

Commit

93f2dde

•

1 Parent(s): 920f1ab

Create README.md

Files changed (1) hide show

README.md +50 -0

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+---
+license: mit
+---
+# TAC RGB encoder
+<!-- Provide a quick summary of what the model is/does. -->
+This model is used for encoding RGB image into a dense feature.
+**Caution,** the model does not contain the last FC layer.
+So, the output features are not aligned with depth.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+The model is pre-trained with RGB-D contrastive objectives, named TAC.
+Different from InfoNCE-based loss fuctions, TAC leverages the similarity between videos frames and estimate a similarity matrix as soft labels.
+The backbone of this version is ViT-B/32.
+The pre-training is conducted on a new unified RGB-D database, UniRGBD.
+The main purpose of this work is depth representation.
+So, the RGB encoder is just a side model.
+### Model Sources
+<!-- Provide the basic links for the model. -->
+- **Repository:** [TAC](https://github.com/RavenKiller/TAC)
+- **Paper:** [Learning Depth Representation from RGB-D Videos by Time-Aware Contrastive Pre-training](https://ieeexplore.ieee.org/document/10288539)
+## Citation
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+```
+@ARTICLE{10288539,
+  author={He, Zongtao and Wang, Liuyi and Dang, Ronghao and Li, Shu and Yan, Qingqing and Liu, Chengju and Chen, Qijun},
+  journal={IEEE Transactions on Circuits and Systems for Video Technology},
+  title={Learning Depth Representation from RGB-D Videos by Time-Aware Contrastive Pre-training},
+  year={2023},
+  volume={},
+  number={},
+  pages={1-1},
+  doi={10.1109/TCSVT.2023.3326373}}
+```