Visor - Natural language Anime Tagging

Visor is a natural-language-based image tagging model based on the BLIP model architecture.

Potential Use cases can be to caption anime images for training diffusion models

Safetensors

Model size

470M params

Tensor type

BF16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

shadowlilac
/

visor