|
--- |
|
license: apache-2.0 |
|
tags: |
|
- vision |
|
- simmim |
|
datasets: |
|
- imagenet-1k |
|
inference: false |
|
--- |
|
|
|
# Swin Transformer (base-sized model) |
|
|
|
Swin Transformer model pre-trained on ImageNet-1k using the SimMIM objective at resolution 192x192. It was introduced in the paper [SimMIM: A Simple Framework for Masked Image Modeling](https://arxiv.org/abs/2111.09886) by Xie et al. and first released in [this repository](https://github.com/microsoft/Swin-Transformer). |
|
|
|
# Intended use cases |
|
|
|
This model is pre-trained only, it's meant to be fine-tuned on a downstream dataset. |
|
|
|
# Usage |
|
|
|
Refer to the [documentation](https://huggingface.co/docs/transformers/model_doc/swin#transformers.SwinForMaskedImageModeling.forward.example). |