back_rag_huggingface / model_data_json /AdamCodd_vit-base-nsfw-detector.json
shayan5422's picture
Upload 3710 files
21cad66 verified
{
"model_id": "AdamCodd/vit-base-nsfw-detector",
"downloads": 1035027,
"tags": [
"transformers.js",
"onnx",
"safetensors",
"vit",
"image-classification",
"transformers",
"nlp",
"base_model:google/vit-base-patch16-384",
"base_model:quantized:google/vit-base-patch16-384",
"license:apache-2.0",
"model-index",
"region:us"
],
"description": "--- metrics: - accuracy pipeline_tag: image-classification base_model: google/vit-base-patch16-384 model-index: - name: AdamCodd/vit-base-nsfw-detector results: - task: type: image-classification name: Image Classification metrics: - type: accuracy value: 0.9654 name: Accuracy - type: AUC value: 0.9948 - type: loss value: 0.0937 name: Loss license: apache-2.0 tags: - transformers.js - transformers - nlp --- # vit-base-nsfw-detector This model is a fine-tuned version of vit-base-patch16-384 on around 25_000 images (drawings, photos...). It achieves the following results on the evaluation set: - Loss: 0.0937 - Accuracy: 0.9654 **<u>New [07/30]</u>**: I created a new ViT model specifically to detect NSFW/SFW images for stable diffusion usage (read the disclaimer below for the reason): **AdamCodd/vit-nsfw-stable-diffusion**. **Disclaimer**: This model wasn't made with generative images in mind! There is no generated image in the dataset used here, and it performs significantly worse on generative images, which will require another ViT model specifically trained on generative images. Here are the model's actual scores for generative images to give you an idea: - Loss: 0.3682 (↑ 292.95%) - Accuracy: 0.8600 (↓ 10.91%) - F1: 0.8654 - AUC: 0.9376 (↓ 5.75%) - Precision: 0.8350 - Recall: 0.8980 ## Model description The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384. ## Intended uses & limitations There are two classes: SFW and NSFW. The model has been trained to be restrictive and therefore classify \"sexy\" images as NSFW. That is, if the image shows cleavage or too much skin, it will be classified as NSFW. This is normal. Usage for a local image: Usage for a distant image: Usage with Transformers.js (Vanilla JS): The model has been trained on a variety of images (realistic, 3D, drawings), yet it is not perfect and some images may be wrongly classified as NSFW when they are not. Additionally, please note that using the quantized ONNX model within the transformers.js pipeline will slightly reduce the model's accuracy. You can find a toy implementation of this model with Transformers.js here. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - num_epochs: 1 ### Training results - Validation Loss: 0.0937 - Accuracy: 0.9654, - AUC: 0.9948 Confusion matrix (eval): [1076 37] [ 60 1627] ### Framework versions - Transformers 4.36.2 - Evaluate 0.4.1 If you want to support me, you can here.",
"model_explanation_gemini": "Classifies images as SFW or NSFW with high accuracy but performs worse on AI-generated images. \n\n**Features:** \n- Fine-tuned ViT model (google/vit-base-patch16-384) \n- 96.54% accuracy on evaluation set \n- Two classes: SFW and NSFW (includes \"sexy\" content) \n- Trained on ~25k diverse images (drawings, photos) \n- Not optimized for generative/AI-created images",
"release_year": null,
"parameter_count": null,
"is_fine_tuned": true,
"category": "Vision",
"api_enhanced": true
}