## Model description This is a fine-tuned model based on [apple/mobilevitv2-1.0-imagenet1k-256](https://huggingface.co/apple/mobilevitv2-1.0-imagenet1k-256) trained for sketch image recognition using [Xenova/quickdraw-small](https://huggingface.co/datasets/Xenova/quickdraw-small) dataset. ## How to use? ``` from transformers import MobileViTImageProcessor, MobileViTV2ForImageClassification from PIL import Image import requests import torch import numpy as np # Importing NumPy url = "https://static.thenounproject.com/png/2024184-200.png" response = requests.get(url, stream=True) # Convert to grayscale to ensure a single channel input image = Image.open(response.raw).convert('L') # Convert to grayscale processor = MobileViTImageProcessor.from_pretrained("laszlokiss27/doodle-dash2") model = MobileViTV2ForImageClassification.from_pretrained("laszlokiss27/doodle-dash2") # Convert the PIL image to a tensor and add a channel dimension image_tensor = torch.unsqueeze(torch.tensor(np.array(image)), 0).float() image_tensor = image_tensor.unsqueeze(0) # Add batch dimension # Check if processor requires specific form of input inputs = processor(images=image_tensor, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits # Get prediction predicted_class_idx = logits.argmax(-1).item() predicted_class = model.config.id2label[predicted_class_idx] print("Predicted class:", predicted_class) ```