Models of experiment: https://github.com/deepghs/tagger_embedding_aligner

import numpy as np

from imgutils.tagging import get_wd14_tags, convert_wd14_emb_to_prediction, denormalize_wd14_emb

embedding, (r, g, c) = get_wd14_tags(
    '/my/image.png',
    fmt=('embedding', ('rating', 'general', 'character')),
)
# normal tag results
print('Expected result:')
print(r)
print(g)
print(c)

# normalize embedding
embedding = embedding / np.linalg.norm(embedding)
# bad tag results
br, bg, bc = convert_wd14_emb_to_prediction(embedding)
print('Bad results due to the embedding normalization:')
print(br)
print(bg)
print(bc)

# denormalize this embedding
output = denormalize_wd14_emb(embedding)
print(output.shape)

# should be similar to r, g, c, approx 1e-3 error
rating, general, character = convert_wd14_emb_to_prediction(output)
print('De-normalized result:')
print(rating)
print(general)
print(character)
Name Tagger Embedding Width Tags Count FLOPS Params EMB Cosine EMB Norm Pred Loss Pred MSE
ViT_v3_mnum2_all ViT_v3 768 10861 0.000398G 0.40M 1 0.1712 0.004306 2.116e-08
ViT_v3_mnum1_all ViT_v3 768 10861 0.000709G 0.71M 1 0.2246 0.004306 3.991e-08
ConvNext_v3_mnum2_all ConvNext_v3 1024 10861 0.000708G 0.71M 1 0.1126 0.004531 2.061e-08
ConvNext_v3_mnum1_all ConvNext_v3 1024 10861 0.001260G 1.26M 1 0.1473 0.004531 3.539e-08
ViT_mnum2_all ViT 768 9083 0.000398G 0.40M 1 0.08641 0.005199 3.797e-09
ViT_mnum1_all ViT 768 9083 0.000709G 0.71M 1 0.1724 0.005199 1.896e-08
ConvNext_mnum2_all ConvNext 1024 9083 0.000708G 0.71M 1 0.05776 0.005213 7.207e-09
ConvNext_mnum1_all ConvNext 1024 9083 0.001260G 1.26M 1 0.07134 0.005214 1.292e-08
ViT_Large_mnum2_all ViT_Large 1024 10861 0.000708G 0.71M 1 1.403 0.003966 1.617e-07
ViT_Large_mnum1_all ViT_Large 1024 10861 0.001260G 1.26M 1 1.643 0.003966 2.24e-07
SwinV2_mnum2_all SwinV2 1024 9083 0.000708G 0.71M 1 0.1257 0.004726 3.797e-08
SwinV2_mnum1_all SwinV2 1024 9083 0.001260G 1.26M 1 0.1497 0.004727 5.487e-08
EVA02_Large_mnum2_all EVA02_Large 1024 10861 0.000708G 0.71M 1 1.268 0.005948 5.466e-08
EVA02_Large_mnum1_all EVA02_Large 1024 10861 0.001260G 1.26M 1 1.713 0.005948 9.518e-08
ConvNextV2_mnum2_all ConvNextV2 1024 9083 0.000708G 0.71M 1 0.09014 0.004596 1.43e-08
ConvNextV2_mnum1_all ConvNextV2 1024 9083 0.001260G 1.26M 1 0.1216 0.004596 2.76e-08
SwinV2_v3_mnum2_all SwinV2_v3 1024 10861 0.000708G 0.71M 1 0.2129 0.004128 4.035e-08
SwinV2_v3_mnum1_all SwinV2_v3 1024 10861 0.001260G 1.26M 1 0.2784 0.004129 6.893e-08
MOAT_mnum2_all MOAT 1024 9083 0.000708G 0.71M 1 0.4662 0.004998 1.855e-08
MOAT_mnum1_all MOAT 1024 9083 0.001260G 1.26M 1 0.7849 0.004998 5.549e-08
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support feature-extraction models for dghs-imgutils library.

Dataset used to train deepghs/wd14_tagger_embedding_denormalize