README.md · aplux/SigLIP-base at main

metadata

license: other
license_name: aplux-model-farm-license
license_link: https://aiot.aidlux.com/api/v1/files/license/model_farm_license_en.pdf
pipeline_tag: image-classification
tags:
  - AIoT
  - QNN

SigLIP-base: lmage Captioning

SigLIP-base is a medium-sized multimodal model developed by Google, built on the SoViT (Shape-optimized Vision Transformer) architecture and trained using Sigmoid Loss instead of the contrastive loss used in CLIP. This training approach improves performance in small-batch settings and enhances robustness to negative samples. SigLIP-base achieves strong results in tasks such as image-text retrieval and zero-shot image classification. With solid inference efficiency and scalability, it is well-suited for multilingual and multitask vision-language applications.

Source model

Input shape: [1x3x384x384], [1x64]
Number of parameters: 88.86M, 105.16M
Model size: 359.10M, 424.01M
Output shape: [1x768], [1x768]

The source model can be found here

Performance Reference

Please search model by model name in Model Farm

Inference & Model Conversion

Please search model by model name in Model Farm

License

Source Model: APACHE-2.0
Deployable Model: APLUX-MODEL-FARM-LICENSE