Visor - Natural language Anime Tagging

Visor is a natural-language-based image tagging model based on the BLIP model architecture.

Potential Use cases can be to caption anime images for training diffusion models

Downloads last month
11
Safetensors
Model size
470M params
Tensor type
BF16
ยท
Inference API
This model can be loaded on Inference API (serverless).

Space using shadowlilac/visor 1