microsoft
/

beit-large-patch16-224-pt22k

@@ -7,9 +7,9 @@ datasets:
 - imagenet-21k
 ---
-# BEiT (large-sized model, fine-tuned on ImageNet-22k)
-BEiT model pre-trained in a self-supervised fashion on ImageNet-22k - also called ImageNet-21k (14 million images, 21,841 classes) at resolution 224x224, and fine-tuned on the same dataset at resolution 224x224. It was introduced in the paper [BEIT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong and Furu Wei and first released in [this repository](https://github.com/microsoft/unilm/tree/master/beit).
 Disclaimer: The team releasing BEiT did not write a model card for this model so this model card has been written by the Hugging Face team.
@@ -32,26 +32,26 @@ fine-tuned versions on a task that interests you.
 Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:
 ```python
-from transformers import BEiTFeatureExtractor, BEiTForImageClassification
 from PIL import Image
 import requests
 url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
 image = Image.open(requests.get(url, stream=True).raw)
-feature_extractor = BEiTFeatureExtractor.from_pretrained('microsoft/beit-large-patch16-224-pt22k')
-model = BEiTForImageClassification.from_pretrained('microsoft/beit-large-patch16-224-pt22k')
 inputs = feature_extractor(images=image, return_tensors="pt")
 outputs = model(**inputs)
 logits = outputs.logits
-# model predicts one of the 21,841 ImageNet-22k classes
-predicted_class_idx = logits.argmax(-1).item()
-print("Predicted class:", model.config.id2label[predicted_class_idx])
 ```
 Currently, both the feature extractor and model support PyTorch.
 ## Training data
-The BEiT model was pretrained on [ImageNet-21k](http://www.image-net.org/), a dataset consisting of 14 million images and 21k classes, and fine-tuned on the same dataset.
 ## Training procedure

 - imagenet-21k
 ---
+# BEiT (large-sized model, pre-trained only)
+BEiT model pre-trained in a self-supervised fashion on ImageNet-22k - also called ImageNet-21k (14 million images, 21,841 classes) at resolution 224x224. It was introduced in the paper [BEIT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong and Furu Wei and first released in [this repository](https://github.com/microsoft/unilm/tree/master/beit).
 Disclaimer: The team releasing BEiT did not write a model card for this model so this model card has been written by the Hugging Face team.
 Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:
 ```python
+from transformers import BeitFeatureExtractor, BeitForMaskedImageModeling
 from PIL import Image
 import requests
 url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
 image = Image.open(requests.get(url, stream=True).raw)
+feature_extractor = BeitFeatureExtractor.from_pretrained('microsoft/beit-large-patch16-224-pt22k')
+model = BeitForMaskedImageModeling.from_pretrained('microsoft/beit-large-patch16-224-pt22k')
 inputs = feature_extractor(images=image, return_tensors="pt")
 outputs = model(**inputs)
 logits = outputs.logits
 ```
 Currently, both the feature extractor and model support PyTorch.
 ## Training data
+The BEiT model was pretrained on [ImageNet-21k](http://www.image-net.org/), a dataset consisting of 14 million images and 21k classes.
 ## Training procedure