edumunozsala commited on
Commit
eb41332
1 Parent(s): 5fbde95

Upload README file

Browse files

Added a initial README file

Files changed (1) hide show
  1. README.md +105 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ tags:
4
+ - sagemaker
5
+ - vit
6
+ - ImageClassification
7
+ - generated_from_trainer
8
+ license: apache-2.0
9
+ datasets:
10
+ - cifar10
11
+ metrics:
12
+ - accuracy
13
+ model-index:
14
+ - name: vit_base-224-in21k-ft-cifar10
15
+ results:
16
+ - task:
17
+ name: Image Classification
18
+ type: image-classification
19
+ dataset:
20
+ name: "Cifar10"
21
+ type: cifar10
22
+ metrics:
23
+ - name: Accuracy,
24
+ type: accuracy,
25
+ value: 0.97
26
+ ---
27
+
28
+ # Model vit_base-224-in21k-ft-cifar10
29
+
30
+ ## **A finetuned model for Image classification in Spanish**
31
+
32
+ This model was trained using Amazon SageMaker and the Hugging Face Deep Learning container,
33
+ The base model is **Vision Transformer (base-sized model)** which is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels.[Link to base model](https://huggingface.co/google/vit-base-patch16-224-in21k)
34
+
35
+ ## Base model citation
36
+ ### BibTeX entry and citation info
37
+
38
+ ```bibtex
39
+ @misc{wu2020visual,
40
+ title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision},
41
+ author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda},
42
+ year={2020},
43
+ eprint={2006.03677},
44
+ archivePrefix={arXiv},
45
+ primaryClass={cs.CV}
46
+ }
47
+ ```
48
+
49
+ ## Dataset
50
+ [Link to dataset description](http://www.cs.toronto.edu/~kriz/cifar.html)
51
+
52
+ The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton
53
+
54
+
55
+ The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
56
+
57
+
58
+ The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
59
+
60
+
61
+ Sizes of datasets:
62
+ - Train dataset: 50,000
63
+ - Test dataset: 10,000
64
+
65
+
66
+ ## Intended uses & limitations
67
+
68
+ This model is intented for Image Classification.
69
+
70
+
71
+ ## Hyperparameters
72
+ {
73
+ "epochs": "5",
74
+ "train_batch_size": "32",
75
+ "eval_batch_size": "8",
76
+ "fp16": "true",
77
+ "learning_rate": "1e-05",
78
+ }
79
+
80
+ ## Test results
81
+
82
+ - Accuracy = 0.97
83
+
84
+
85
+ ## Model in action
86
+
87
+ ### Usage for Image Classification
88
+
89
+ ```python
90
+ from transformers import ViTFeatureExtractor, ViTModel
91
+ from PIL import Image
92
+ import requests
93
+
94
+ url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
95
+ image = Image.open(requests.get(url, stream=True).raw)
96
+
97
+ feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')
98
+ model = ViTModel.from_pretrained('edumunozsala/vit_base-224-in21k-ft-cifar10')
99
+ inputs = feature_extractor(images=image, return_tensors="pt")
100
+
101
+ outputs = model(**inputs)
102
+ last_hidden_states = outputs.last_hidden_state
103
+ ```
104
+
105
+ Created by [Eduardo Muñoz/@edumunozsala](https://github.com/edumunozsala)