Edit model card

siglip-tagger-test-2

This model is a fine-tuned version of google/siglip-base-patch16-512 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 364.7850
  • Accuracy: 0.2539
  • F1: 0.9967

Model description

This model is an experimental model that predicts danbooru tags of images.

Example

from PIL import Image

import torch
from transformers import (
    AutoModelForImageClassification,
    AutoImageProcessor,
)
import numpy as np

MODEL_NAME = "p1atdev/siglip-tagger-test-2"

model = AutoModelForImageClassification.from_pretrained(
    MODEL_NAME, torch_dtype=torch.bfloat16, trust_remote_code=True
)
model.eval()
processor = AutoImageProcessor.from_pretrained(MODEL_NAME)

image = Image.open("sample.jpg") # load your image
inputs = processor(image, return_tensors="pt").to(model.device, model.dtype)

logits = model(**inputs).logits.detach().cpu().float()[0]
logits = np.clip(logits, 0.0, 1.0)

results = {
    model.config.id2label[i]: logit for i, logit in enumerate(logits) if logit > 0
}
results = sorted(results.items(), key=lambda x: x[1], reverse=True)

for tag, score in results:
    print(f"{tag}: {score*100:.2f}%")
# 1girl: 100.00%
# outdoors: 100.00%
# sky: 100.00%
# solo: 100.00%
# school uniform: 96.88%
# skirt: 92.97%
# day: 89.06%
# ...

Intended uses & limitations

This model is for research use only and is not recommended for production.

Please use wd-v1-4-tagger series by SmilingWolf:

etc.

Training and evaluation data

High quality 5000 images from danbooru. They were shuffled and split into train:eval at 4500:500.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
1496.9876 1.0 141 691.3267 0.1242 0.9957
860.0218 2.0 282 433.5286 0.1626 0.9965
775.4277 3.0 423 409.0374 0.1827 0.9966
697.2465 4.0 564 396.5604 0.2025 0.9966
582.6023 5.0 705 388.3294 0.2065 0.9966
617.5087 6.0 846 382.2605 0.2213 0.9966
627.533 7.0 987 377.6726 0.2269 0.9967
595.4033 8.0 1128 374.3268 0.2327 0.9967
593.3854 9.0 1269 371.4181 0.2409 0.9967
537.9777 10.0 1410 369.5010 0.2421 0.9967
552.3083 11.0 1551 368.0743 0.2468 0.9967
570.5438 12.0 1692 366.8302 0.2498 0.9967
507.5343 13.0 1833 366.1787 0.2499 0.9967
515.5528 14.0 1974 365.5653 0.2525 0.9967
458.5096 15.0 2115 365.1838 0.2528 0.9967
515.6953 16.0 2256 364.9844 0.2535 0.9967
533.7929 17.0 2397 364.8577 0.2538 0.9967
520.3728 18.0 2538 364.8066 0.2537 0.9967
525.1097 19.0 2679 364.7850 0.2539 0.9967
482.0612 20.0 2820 364.7876 0.2539 0.9967

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.1.2+cu118
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
6
Safetensors
Model size
101M params
Tensor type
F32
·

Finetuned from

Space using p1atdev/siglip-tagger-test-2 1

Collection including p1atdev/siglip-tagger-test-2