kimihailv commited on
Commit
f884fc9
1 Parent(s): c77c978

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -29
README.md CHANGED
@@ -51,7 +51,7 @@ To load the model:
51
  ```python
52
  import uform
53
 
54
- model = uform.get_model_onnx('unum-cloud/uform-vl-english-small', device='gpu', dtype='fp16')
55
  ```
56
 
57
  To encode data:
@@ -62,11 +62,11 @@ from PIL import Image
62
  text = 'a small red panda in a zoo'
63
  image = Image.open('red_panda.jpg')
64
 
65
- image_data = model.preprocess_image(image)
66
- text_data = model.preprocess_text(text)
67
 
68
- image_embedding = model.encode_image(image_data)
69
- text_embedding = model.encode_text(text_data)
70
  score, joint_embedding = model.encode_multimodal(
71
  image_features=image_features,
72
  text_features=text_features,
@@ -75,33 +75,10 @@ score, joint_embedding = model.encode_multimodal(
75
  )
76
  ```
77
 
78
- To get features:
79
-
80
- ```python
81
- image_features, image_embedding = model.encode_image(image_data, return_features=True)
82
- text_features, text_embedding = model.encode_text(text_data, return_features=True)
83
- ```
84
-
85
- These features can later be used to produce joint multimodal encodings faster, as the first layers of the transformer can be skipped:
86
-
87
- ```python
88
- joint_embedding = model.encode_multimodal(
89
- image_features=image_features,
90
- text_features=text_features,
91
- attention_mask=text_data['attention_mask']
92
- )
93
- ```
94
-
95
- There are two options to calculate semantic compatibility between an image and a text: [Cosine Similarity](#cosine-similarity) and [Matching Score](#matching-score).
96
 
97
  ### Cosine Similarity
98
 
99
- ```python
100
- import torch.nn.functional as F
101
-
102
- similarity = F.cosine_similarity(image_embedding, text_embedding)
103
- ```
104
-
105
  The `similarity` will belong to the `[-1, 1]` range, `1` meaning the absolute match.
106
 
107
  __Pros__:
 
51
  ```python
52
  import uform
53
 
54
+ model, processor = uform.get_model_onnx('unum-cloud/uform-vl-english-small', device='gpu', dtype='fp16')
55
  ```
56
 
57
  To encode data:
 
62
  text = 'a small red panda in a zoo'
63
  image = Image.open('red_panda.jpg')
64
 
65
+ image_data = processor.preprocess_image(image)
66
+ text_data = processor.preprocess_text(text)
67
 
68
+ image_features, image_embedding = model.encode_image(image_data, return_features=True)
69
+ text_features, text_embedding = model.encode_text(text_data, return_features=True)
70
  score, joint_embedding = model.encode_multimodal(
71
  image_features=image_features,
72
  text_features=text_features,
 
75
  )
76
  ```
77
 
78
+ There are two options to calculate semantic compatibility between an image and a text: cosine similarity and [Matching Score](#matching-score).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  ### Cosine Similarity
81
 
 
 
 
 
 
 
82
  The `similarity` will belong to the `[-1, 1]` range, `1` meaning the absolute match.
83
 
84
  __Pros__: