AM-RADIO: Reduce All Domains Into One

Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov

Pretrained Models

HuggingFace Hub

Pull the E-RADIO model from a Python script:

from transformers import AutoModel
model = AutoModel.from_pretrained("nvidia/E-RADIO", trust_remote_code=True)

Usage

E-RADIO will return a tuple with two tensors. The summary is similar to the cls_token in ViT and is meant to represent the general concept of the entire image. It has shape $(B,C)$ with $B$ being the batch dimension, and $C$ being some number of channels. The spatial_features represent more localized content which should be suitable for dense tasks such as semantic segmentation, or for integration into an LLM. Spatial features have shape $(B,H,W,D)$ with $H$ being the height, and $W$ being the width of the spatial features.

Training

Coming Soon

License

RADIO code and weights are released under the NSCLv1 License.

Citing RADIO

If you find this repository useful, please consider giving a star and citation:

@misc{ranzinger2023amradio,
      title={AM-RADIO: Agglomerative Model -- Reduce All Domains Into One},
      author={Mike Ranzinger and Greg Heinrich and Jan Kautz and Pavlo Molchanov},
      year={2023},
      eprint={2312.06709},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}