Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
6.0.2
metadata
title: NAF Zero-Shot Feature Upsampling
emoji: π―
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: apache-2.0
π― NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering
This Space demonstrates NAF (Neighborhood Attention Filtering), a method for upsampling features from Vision Foundation Models to any resolution without model-specific training.
π Features
- Universal Upsampling: Works with any Vision Foundation Model (DINOv2, DINOv3, RADIO, DINO, SigLIP, etc.)
- Arbitrary Resolutions: Upsample features to any target resolution while maintaining aspect ratio
- Zero-Shot: No model-specific training or fine-tuning required
- Interactive Demo: Upload your own images or try sample images from various domains
π¨ How to Use
- Upload an Image: Click "Upload Your Image" or select from sample images
- Choose a Model: Select a Vision Foundation Model from the dropdown
- Set Resolution: Choose the target resolution for upsampled features (64-512)
- Click "Upsample Features": See the comparison between low and high-resolution features
π Visualization
The output shows three panels:
- Left: Your input image
- Center: Low-resolution features from the backbone (PCA visualization)
- Right: High-resolution features upsampled by NAF
Features are visualized using PCA for the first 3 principal components as RGB channels.
π¬ Supported Models
- DINOv3: Latest self-supervised vision models
- RADIO v2.5: High-performance vision backbones
- DINOv2: Self-supervised learning with registers
- DINO: Original self-supervised ViT
- SigLIP: Contrastive vision-language models
π Learn More
- Paper: NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering
- Code: GitHub Repository
- Organization: Valeo.ai
π‘ Use Cases
NAF enables better feature representations for:
- Dense prediction tasks (segmentation, depth estimation)
- High-resolution visual understanding
- Feature matching and correspondence
- Vision-language alignment
βοΈ Technical Details
- Input: Images up to 512px (maintains aspect ratio)
- Processing: Backbone feature extraction β NAF upsampling
- Output: High-resolution features at target resolution
- Device: Runs on CPU (free tier) or GPU (faster inference)
π€ Citation
If you use NAF in your research, please cite:
@article{chambon2025naf,
title={NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering},
author={Chambon, Lucas and others},
journal={arXiv preprint arXiv:2501.01535},
year={2025}
}
π License
This demo is released under the Apache 2.0 license.