---
title: README
emoji: 📉
colorFrom: gray
colorTo: blue
sdk: static
pinned: false
---

# All Things ViTs: Understanding and Interpreting Attention in Vision (CVPR'23 tutorial)

*By: [Hila Chefer](https://hila-chefer.github.io) and [Sayak Paul](https://sayak.dev)*

*Website: [atv.github.io])https://atv.github.io)*

*Abstract: In this tutorial, we explore different ways to leverage attention in vision. From left to right: (i) attention can be used to explain the predictions by the model (e.g., CLIP for an image-text pair) (ii) By manipulating the attention-based explainability maps, one can enforce that the prediction is made based on the right reasons (e.g., foreground vs. background) (iii) The cross-attention maps of multi-modal models can be used to guide generative models (e.g., mitigating neglect in Stable Diffusion).*

This organization hosts all the interactive demos to be presented at the tutorial. Below, you can find some of them.