Spaces:

LChambon
/

NAF

Running on Zero

App Files Files Community

NAF / README.md

LChambon

Update README.md

bb088ff verified 6 days ago

preview code

raw

history blame contribute delete

2.89 kB

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

metadata

title: NAF Zero-Shot Feature Upsampling
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
license: apache-2.0

🎯 NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering

This Space demonstrates NAF (Neighborhood Attention Filtering), a method for upsampling features from Vision Foundation Models to any resolution without model-specific training.

🚀 Features

Universal Upsampling: Works with any Vision Foundation Model (DINOv2, DINOv3, RADIO, DINO, SigLIP, etc.)
Arbitrary Resolutions: Upsample features to any target resolution while maintaining aspect ratio
Zero-Shot: No model-specific training or fine-tuning required
Interactive Demo: Upload your own images or try sample images from various domains

🎨 How to Use

Upload an Image: Click "Upload Your Image" or select from sample images
Choose a Model: Select a Vision Foundation Model from the dropdown
Set Resolution: Choose the target resolution for upsampled features (64-512)
Click "Upsample Features": See the comparison between low and high-resolution features

📊 Visualization

The output shows three panels:

Left: Your input image
Center: Low-resolution features from the backbone (PCA visualization)
Right: High-resolution features upsampled by NAF

Features are visualized using PCA for the first 3 principal components as RGB channels.

🔬 Supported Models

DINOv3: Latest self-supervised vision models
RADIO v2.5: High-performance vision backbones
DINOv2: Self-supervised learning with registers
DINO: Original self-supervised ViT
SigLIP: Contrastive vision-language models

📖 Learn More

Paper: NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering
Code: GitHub Repository
Organization: Valeo.ai

💡 Use Cases

NAF enables better feature representations for:

Dense prediction tasks (segmentation, depth estimation)
High-resolution visual understanding
Feature matching and correspondence
Vision-language alignment

⚙️ Technical Details

Input: Images up to 512px (maintains aspect ratio)
Processing: Backbone feature extraction → NAF upsampling
Output: High-resolution features at target resolution
Device: Runs on CPU (free tier) or GPU (faster inference)

🤝 Citation

If you use NAF in your research, please cite:

@article{chambon2025naf,
  title={NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering},
  author={Chambon, Lucas and others},
  journal={arXiv preprint arXiv:2501.01535},
  year={2025}
}

📜 License

This demo is released under the Apache 2.0 license.