TAC RGB encoder
This model is used for encoding RGB image into a dense feature.
Caution, the model does not contain the last FC layer. So, the output features are not aligned with depth.
Model Details
Model Description
The model is pre-trained with RGB-D contrastive objectives, named TAC. Different from InfoNCE-based loss fuctions, TAC leverages the similarity between videos frames and estimate a similarity matrix as soft labels. The backbone of this version is ViT-B/32. The pre-training is conducted on a new unified RGB-D database, UniRGBD. The main purpose of this work is depth representation. So, the RGB encoder is just a side model.
Model Sources
- Repository: TAC
- Paper: Learning Depth Representation from RGB-D Videos by Time-Aware Contrastive Pre-training
Citation
@ARTICLE{10288539,
author={He, Zongtao and Wang, Liuyi and Dang, Ronghao and Li, Shu and Yan, Qingqing and Liu, Chengju and Chen, Qijun},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Learning Depth Representation From RGB-D Videos by Time-Aware Contrastive Pre-Training},
year={2024},
volume={34},
number={6},
pages={4143-4158},
doi={10.1109/TCSVT.2023.3326373}}
- Downloads last month
- 159
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.