--- library_name: transformers tags: - vision - anime - image-feature-extraction --- # ViTMAE (base-sized model) pre-trained on Pixiv ViTMAE model pre-trained on Pixiv artworks from id 20 to 100649536. Architecture is the same as [facebook/vit-mae-base](https://huggingface.co/facebook/vit-mae-base), but with a smaller patch size (14) and a larger image size (266). All training was done on TPUs sponsored by [TPU Research Cloud](https://sites.research.google/trc/about/). ## Usage ``` from transformers import AutoImageProcessor, ViTMAEForPreTraining, ViTModel # for resizing images to 266 pixes and normalizing to [-1, 1] processor = AutoImageProcessor.from_pretrained("zapparias/pixiv-vit-mae-base") # load encoder + decoder model = ViTMAEForPreTraining.from_pretrained("zapparias/pixiv-vit-mae-base") # you can also load the encoder into a standard ViT model for feature extraction model = ViTModel.from_pretrained("zapparias/pixiv-vit-mae-base", add_pooling_layer=False) ```