Spaces:
Runtime error
Runtime error
.. raw:: html | |
<h1 style="text-align: center"> | |
VirTex: Learning Visual Representations from Textual Annotations | |
</h1> | |
<h4 style="text-align: center"> | |
Karan Desai and Justin Johnson | |
</br> | |
<span style="font-size: 14pt; color: #555555"> | |
University of Michigan | |
</span> | |
</h4> | |
<hr> | |
<h4 style="text-align: center"> | |
Abstract | |
</h4> | |
<p style="text-align: justify"> | |
The de-facto approach to many vision tasks is to start from pretrained | |
visual representations, typically learned via supervised training on | |
ImageNet. Recent methods have explored unsupervised pretraining to scale to | |
vast quantities of unlabeled images. In contrast, we aim to learn | |
high-quality visual representations from fewer images. To this end we | |
revisit supervised pretraining, and seek data-efficient alternatives to | |
classification-based pretraining. We propose VirTex -- a pretraining | |
approach using semantically dense captions to learn visual representations. | |
We train convolutional networks from scratch on COCO Captions, and transfer | |
them to downstream recognition tasks including image classification, object | |
detection, and instance segmentation. On all tasks, VirTex yields features | |
that match or exceed those learned on ImageNet -- supervised or unsupervised | |
-- despite using up to ten times fewer images. | |
</p> | |
**CVPR 2021. Paper available at:** `arxiv.org/abs/2006.06666 <https://arxiv.org/abs/2006.06666>`_. | |
**Code available at:** `github.com/kdexd/virtex <https://github.com/kdexd/virtex>`_. | |
.. image:: _static/system_figure.jpg | |
Get the pretrained ResNet-50 visual backbone from our best performing VirTex | |
model in one line *without any installation*! | |
.. code-block:: python | |
import torch | |
# That's it, this one line only requires PyTorch. | |
model = torch.hub.load("kdexd/virtex", "resnet50", pretrained=True) | |
More details in :doc:`virtex/usage/model_zoo`. Next, dive deeper into our | |
code with User Guide and API References! | |
User Guide | |
---------- | |
.. toctree:: | |
:maxdepth: 2 | |
virtex/usage/setup_dependencies | |
virtex/usage/model_zoo | |
virtex/usage/pretrain | |
virtex/usage/downstream | |
API Reference | |
------------- | |
.. toctree:: | |
:maxdepth: 2 | |
virtex/config | |
virtex/factories | |
virtex/data | |
virtex/models | |
virtex/modules | |
virtex/optim | |
virtex/utils | |
virtex/model_zoo | |
Citation | |
-------- | |
If you find this code useful, please consider citing: | |
.. code-block:: text | |
@inproceedings{desai2021virtex, | |
title={{VirTex: Learning Visual Representations from Textual Annotations}}, | |
author={Karan Desai and Justin Johnson}, | |
booktitle={CVPR}, | |
year={2021} | |
} | |
Acknowledgments | |
--------------- | |
We thank Harsh Agrawal, Mohamed El Banani, Richard Higgins, Nilesh Kulkarni | |
and Chris Rockwell for helpful discussions and feedback on the paper. We thank | |
Ishan Misra for discussions regarding PIRL evaluation protocol; Saining Xie for | |
discussions about replicating iNaturalist evaluation as MoCo; Ross Girshick and | |
Yuxin Wu for help with Detectron2 model zoo; Georgia Gkioxari for suggesting | |
the Instance Segmentation pretraining task ablation; and Stefan Lee for | |
suggestions on figure aesthetics. We thank Jia Deng for access to extra GPUs | |
during project development; and UMich ARC-TS team for support with GPU cluster | |
management. Finally, we thank all the Starbucks outlets in Ann Arbor for many | |
hours of free WiFi. This work was partially supported by the Toyota Research | |
Institute (TRI). However, note that this article solely reflects the opinions | |
and conclusions of its authors and not TRI or any other Toyota entity. | |
Indices and Tables | |
------------------ | |
* :ref:`genindex` | |
* :ref:`modindex` | |
* :ref:`search` | |