Visor - Natural language Anime Tagging

Visor is a natural-language-based image tagging model based on the BLIP model architecture.

Potential Use cases can be to caption anime images for training diffusion models

Downloads last month
6
Safetensors
Model size
470M params
Tensor type
BF16
ยท

Space using shadowlilac/visor 1