AM-RADIO: Reduce All Domains Into One
Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov
Pretrained Models
HuggingFace Hub
Pull the E-RADIO model from a Python script:
from transformers import AutoModel
model = AutoModel.from_pretrained("nvidia/E-RADIO", trust_remote_code=True)
Usage
E-RADIO will return a tuple with two tensors.
The summary
is similar to the cls_token
in ViT and is meant to represent the general concept of the entire image.
It has shape $(B,C)$ with $B$ being the batch dimension, and $C$ being some number of channels.
The spatial_features
represent more localized content which should be suitable for dense tasks such as semantic segmentation, or for integration into an LLM.
Spatial features have shape $(B,H,W,D)$ with $H$ being the height, and $W$ being the width of the spatial features.
Training
Coming Soon
License
RADIO code and weights are released under the NSCLv1 License.
Citing RADIO
If you find this repository useful, please consider giving a star and citation:
@misc{ranzinger2023amradio,
title={AM-RADIO: Agglomerative Model -- Reduce All Domains Into One},
author={Mike Ranzinger and Greg Heinrich and Jan Kautz and Pavlo Molchanov},
year={2023},
eprint={2312.06709},
archivePrefix={arXiv},
primaryClass={cs.CV}
}