SigLIP-base / README.md
qc903113684's picture
Update README.md
18fd315 verified
metadata
license: other
license_name: aplux-model-farm-license
license_link: https://aiot.aidlux.com/api/v1/files/license/model_farm_license_en.pdf
pipeline_tag: image-classification
tags:
  - AIoT
  - QNN

SigLIP-base: lmage Captioning

SigLIP-base is a medium-sized multimodal model developed by Google, built on the SoViT (Shape-optimized Vision Transformer) architecture and trained using Sigmoid Loss instead of the contrastive loss used in CLIP. This training approach improves performance in small-batch settings and enhances robustness to negative samples. SigLIP-base achieves strong results in tasks such as image-text retrieval and zero-shot image classification. With solid inference efficiency and scalability, it is well-suited for multilingual and multitask vision-language applications.

Source model

  • Input shape: [1x3x384x384], [1x64]
  • Number of parameters: 88.86M, 105.16M
  • Model size: 359.10M, 424.01M
  • Output shape: [1x768], [1x768]

The source model can be found here

Performance Reference

Please search model by model name in Model Farm

Inference & Model Conversion

Please search model by model name in Model Farm

License