File size: 1,736 Bytes
978d9c7 37b40c1 978d9c7 5147416 978d9c7 5147416 978d9c7 aa884ae 978d9c7 aa884ae 978d9c7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
language: en
datasets:
- abdulmananraja/real-life-violence-situations
tags:
- image-classification
- vision
- violence-detection
license: apache-2.0
---
# ViT Base Violence Detection
## Model Description
This is a Vision Transformer (ViT) model fine-tuned for violence detection. The model is based on [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) and has been trained on the [Real Life Violence Situations](https://www.kaggle.com/datasets/mohamedmustafa/real-life-violence-situations-dataset) dataset from Kaggle to classify images into violent or non-violent categories.
## Intended Use
The model is intended for use in applications where detecting violent content in images is necessary. This can include:
- Content moderation
- Surveillance
- Parental control software
## Model accuracy
Test accuracy for Vit Base = 98.80%
Loss = 0.20038144290447235
## How to Use
Here is an example of how to use this model for image classification:
```python
import torch
from transformers import ViTForImageClassification, ViTFeatureExtractor
from PIL import Image
# Load the model and feature extractor
model = ViTForImageClassification.from_pretrained('jaranohaal/vit-base-violence-detection')
feature_extractor = ViTFeatureExtractor.from_pretrained('jaranohaal/vit-base-violence-detection')
# Load an image
image = Image.open('image.jpg')
# Preprocess the image
inputs = feature_extractor(images=image, return_tensors="pt")
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class_idx = logits.argmax(-1).item()
# Print the predicted class
print("Predicted class:", model.config.id2label[predicted_class_idx])
|