ashvardanian commited on
Commit
a01951e
1 Parent(s): e2c6da8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -18,10 +18,10 @@ In Python, JavaScript, and Swift<br/>
18
  ---
19
 
20
  The `uform3-image-text-english-small` UForm model is a tiny vision and English language encoder, mapping them into a shared vector space.
21
- This model is made of:
22
 
23
- * Text encoder: 4-layer BERT.
24
- * Visual encoder: ViT-S/16 for images of 224x224 resolution.
25
 
26
  Unlike most CLIP-like multomodal models, this model shares 2 layers between the text and visual encoder to allow for more data- and parameter-efficient training.
27
  Also unlike most models, UForm provides checkpoints compatible with PyTorch, ONNX, and CoreML, covering the absolute majority of AI-capable devices, with pre-quantized weights and inference code.
 
18
  ---
19
 
20
  The `uform3-image-text-english-small` UForm model is a tiny vision and English language encoder, mapping them into a shared vector space.
21
+ This model produces up to __256-dimensional embeddings__ and is made of:
22
 
23
+ * Text encoder: 4-layer BERT for up to 64 input tokens.
24
+ * Visual encoder: ViT-S/16 for images of 224 x 224 resolution.
25
 
26
  Unlike most CLIP-like multomodal models, this model shares 2 layers between the text and visual encoder to allow for more data- and parameter-efficient training.
27
  Also unlike most models, UForm provides checkpoints compatible with PyTorch, ONNX, and CoreML, covering the absolute majority of AI-capable devices, with pre-quantized weights and inference code.