edumunozsala
/

vit_base-224-in21k-ft-cifar100

Image Classification

ImageClassification

generated_from_trainer

Inference Endpoints

Model card Files Files and versions Community

vit_base-224-in21k-ft-cifar100 / README.md

edumunozsala's picture

Fix Accuracy Typo (#1)

130d438 almost 2 years ago

|

raw history blame contribute delete

No virus

3.23 kB

	---
	language: es
	tags:
	- sagemaker
	- vit
	- ImageClassification
	- generated_from_trainer
	license: apache-2.0
	datasets:
	- cifar100
	metrics:
	- accuracy
	model-index:
	- name: vit_base-224-in21k-ft-cifar100
	results:
	- task:
	name: Image Classification
	type: image-classification
	dataset:
	name: "Cifar100"
	type: cifar100
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.9148
	---

	# Model vit_base-224-in21k-ft-cifar100

	## A finetuned model for Image classification in Spanish

	This model was trained using Amazon SageMaker and the Hugging Face Deep Learning container,
	The base model is Vision Transformer (base-sized model) which is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels.[Link to base model](https://huggingface.co/google/vit-base-patch16-224-in21k)

	## Base model citation
	### BibTeX entry and citation info

	```bibtex
	@misc{wu2020visual,
	title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision},
	author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda},
	year={2020},
	eprint={2006.03677},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```

	## Dataset
	[Link to dataset description](http://www.cs.toronto.edu/~kriz/cifar.html)

	The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton


	The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
	This dataset,CIFAR100, is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).

	Sizes of datasets:
	- Train dataset: 50,000
	- Test dataset: 10,000


	## Intended uses & limitations

	This model is intented for Image Classification.


	## Hyperparameters
	{
	"epochs": "5",
	"train_batch_size": "32",
	"eval_batch_size": "8",
	"fp16": "true",
	"learning_rate": "1e-05",
	}

	## Test results

	- Accuracy = 0.9148


	## Model in action

	### Usage for Image Classification

	```python
	from transformers import ViTFeatureExtractor, ViTModel
	from PIL import Image
	import requests

	url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
	image = Image.open(requests.get(url, stream=True).raw)

	feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')
	model = ViTModel.from_pretrained('edumunozsala/vit_base-224-in21k-ft-cifar100')
	inputs = feature_extractor(images=image, return_tensors="pt")

	outputs = model(**inputs)
	last_hidden_states = outputs.last_hidden_state
	```

	Created by [Eduardo Muñoz/@edumunozsala](https://github.com/edumunozsala)