An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper β’ 2010.11929 β’ Published β’ 16
This is a bit-identical mirror of the canonical artifact from Google Research.
The mirror exists only as a resilience fallback for the xaitalk library β the upstream remains authoritative. All credit and licensing for the model belong to the original authors.
| Field | Value |
|---|---|
| Original authors | Google Research |
| Upstream (authoritative) | https://huggingface.co/google/vit-base-patch16-224 |
| Source repo | https://github.com/google-research/vision_transformer |
| Paper | https://arxiv.org/abs/2010.11929 (Dosovitskiy et al. 2020) |
| License | apache-2.0 (inherited from upstream β please respect upstream's terms) |
| Mirror file | pytorch_model.bin |
| SHA-256 | 5f17067668129d23b52524f90a805e7d9914c276d90a59a13ebe81a09e40ceca |
| Size | 346,351,599 bytes (330.3 MB) |
from xaitalk.hub import ensure_model
weights_path = ensure_model("vit-base-patch16-224")
# Tries the canonical upstream first; falls back to this xaitalk mirror
# automatically if upstream is unreachable.
xaitalk's research-grade reproducibility claim relies on every weight file
being recoverable years from now. We mirror artifacts β€ 2.5 GB under
xaitalk/*-mirror so the pipeline survives upstream URL changes, repo
renames, or deletions. Bit-level parity with the canonical is asserted in
CI via python -m xaitalk.hub verify-mirrors.
If you use this model, please cite the original paper (not the mirror):
https://arxiv.org/abs/2010.11929 (Dosovitskiy et al. 2020)