MambaVision: A Hybrid Mamba-Transformer Vision Backbone.
Model Overview
We introduce a novel mixer block by creating a symmetric path without SSM to enhance the modeling of global context. MambaVision has a hierarchical architecture that employs both self-attention and mixer blocks.
Model Performance
MambaVision demonstrates a strong performance by achieving a new SOTA Pareto-front in terms of Top-1 accuracy and throughput.
Model Usage
You must first login into HuggingFace to pull the model:
huggingface-cli login
The model can be simply used according to:
access_token = "<YOUR ACCESS TOKEN"
model = AutoModel.from_pretrained("nvidia/MambaVision-L-1K", trust_remote_code=True)
License:
Unable to determine this model's library. Check the
docs
.