visheratin commited on
Commit
260330d
1 Parent(s): 7c5d988

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -21,8 +21,9 @@ LLaVA-3b is a model fine-tuned from [Dolphin 2.6 Phi](https://huggingface.co/cog
21
  [SigLIP 400M](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384). There are a couple of things different from the original LLaVA architecture:
22
 
23
  1. Multiple image tokens. The multimodal projector generates embeddings of shape [5, 2560] instead of [1, 2560] for images. The idea is that using more tokens
24
- allows to get more info from the image into the language model.
25
- 2. The model uses the output from the latest layer of the vision encoder instead of intermediate one.
 
26
 
27
  As Dolphin 2.6 Phi, LLaVA-3b uses ChatML prompt format:
28
 
@@ -111,7 +112,12 @@ output = model.generate(**inputs, max_new_tokens=200, do_sample=True, top_p=0.5,
111
  ```
112
 
113
  ## License
114
- This model is based on Phi-2 and is governed by Microsoft's microsoft-research-license which prohibits commercial use.
 
 
 
 
 
115
 
116
  **Where to send questions or comments about the model:**
117
 
 
21
  [SigLIP 400M](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384). There are a couple of things different from the original LLaVA architecture:
22
 
23
  1. Multiple image tokens. The multimodal projector generates embeddings of shape [5, 2560] instead of [1, 2560] for images. The idea is that using more tokens
24
+ allows us to get more info from the image into the language model.
25
+ 2. The model uses the output from the latest layer of the vision encoder instead of the intermediate one.
26
+ 3. The context length during training was 1200 tokens, as the L4 GPUs I used didn't allow me to get more.
27
 
28
  As Dolphin 2.6 Phi, LLaVA-3b uses ChatML prompt format:
29
 
 
112
  ```
113
 
114
  ## License
115
+
116
+ This model is based on Phi-2 and is governed by Microsoft's research license, which prohibits commercial use.
117
+
118
+ ## Acknowledgments
119
+
120
+ Thanks to [ML Collective](https://mlcollective.org/) for providing credits for computing resources.
121
 
122
  **Where to send questions or comments about the model:**
123