visheratin
/

MC-LLaVA-3b

Inference Endpoints

Model card Files Files and versions Community

visheratin commited on Dec 31, 2023

Commit

260330d

·

1 Parent(s): 7c5d988

Update README.md

Files changed (1) hide show

README.md +9 -3

README.md CHANGED Viewed

@@ -21,8 +21,9 @@ LLaVA-3b is a model fine-tuned from [Dolphin 2.6 Phi](https://huggingface.co/cog
 [SigLIP 400M](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384). There are a couple of things different from the original LLaVA architecture:
 1. Multiple image tokens. The multimodal projector generates embeddings of shape [5, 2560] instead of [1, 2560] for images. The idea is that using more tokens
-   allows to get more info from the image into the language model.
-2. The model uses the output from the latest layer of the vision encoder instead of intermediate one.
 As Dolphin 2.6 Phi, LLaVA-3b uses ChatML prompt format:
@@ -111,7 +112,12 @@ output = model.generate(**inputs, max_new_tokens=200, do_sample=True, top_p=0.5,
 ```
 ## License
-This model is based on Phi-2 and is governed by Microsoft's microsoft-research-license which prohibits commercial use.
 **Where to send questions or comments about the model:**

 [SigLIP 400M](https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384). There are a couple of things different from the original LLaVA architecture:
 1. Multiple image tokens. The multimodal projector generates embeddings of shape [5, 2560] instead of [1, 2560] for images. The idea is that using more tokens
+   allows us to get more info from the image into the language model.
+2. The model uses the output from the latest layer of the vision encoder instead of the intermediate one.
+3. The context length during training was 1200 tokens, as the L4 GPUs I used didn't allow me to get more.
 As Dolphin 2.6 Phi, LLaVA-3b uses ChatML prompt format:
 ```
 ## License
+This model is based on Phi-2 and is governed by Microsoft's research license, which prohibits commercial use.
+## Acknowledgments
+Thanks to [ML Collective](https://mlcollective.org/) for providing credits for computing resources.
 **Where to send questions or comments about the model:**