visheratin commited on
Commit
d0454de
1 Parent(s): 06b6855

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -24,7 +24,7 @@ widget:
24
 
25
  ## Model details
26
 
27
- The core idea behind multi-crop LLaVA is that instead of N visual token embeddings per image, I generate one token embedding per N parts of the image.
28
  Having high-quality embeddings for smaller parts of the image helps to extract more details and understand the scene better.
29
 
30
  For every crop of the image, I generate an embedding from the full SigLIP encoder (size [1, 1152]) and then push all N embeddings through the LLaVA adapter, which
 
24
 
25
  ## Model details
26
 
27
+ The core idea behind multi-crop LLaVA (MC-LLaVA) is that instead of N visual token embeddings per image, I generate one token embedding per N parts of the image.
28
  Having high-quality embeddings for smaller parts of the image helps to extract more details and understand the scene better.
29
 
30
  For every crop of the image, I generate an embedding from the full SigLIP encoder (size [1, 1152]) and then push all N embeddings through the LLaVA adapter, which