---
license: mit
tags:
- 'vit '
- image classification
- ggml
---

# Vision Transformer (ViT) models for image classification converted to ggml format

[Available models](https://github.com/staghado/vit.cpp)

| Model     | Disk   | Mem     | SHA                                        |
| ---       | ---    | ---     | ---                                        |
| tiny      | 12 MB  | ~20 MB  | `25ce65ff60e08a1a5b486685b533d79718e74c0f` |
| small     | 45 MB  | ~52 MB  | `7a9f85340bd1a3dcd4275f46d5ee1db66649700e` |
| base      | 174 MB | ~179 MB | `a10d29628977fe27691edf55b7238f899b8c02eb` |
| large     | 610 MB | ~597 MB | `5f27087930f21987050188f9dc9eea75ac607214` |

The models are pre-trained on ImageNet21k then finetuned on ImageNet1k 
with a patch size of 16 and an image size of 224.

For more information, visit:

https://github.com/staghado/vit.cpp