marianna13
commited on
Commit
•
19046ef
1
Parent(s):
2d24a17
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
license: mit
|
5 |
+
library_name: transformers
|
6 |
+
datasets:
|
7 |
+
- liuhaotian/LLaVA-Instruct-150K
|
8 |
+
- liuhaotian/LLaVA-Pretrain
|
9 |
+
---
|
10 |
+
|
11 |
+
# Model Card for LLaVa-Phi-2-3B-GGUF
|
12 |
+
|
13 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
14 |
+
|
15 |
+
## Model Details
|
16 |
+
|
17 |
+
### Model Description
|
18 |
+
|
19 |
+
<!-- Provide a longer summary of what this model is. -->
|
20 |
+
|
21 |
+
Quantized version of [llava-phi-2-3b](https://huggingface.co/marianna13/llava-phi-2-3b). Quantization was done using [llama.cpp](https://github.com/ggerganov/llama.cpp/tree/master/examples/llava)
|
22 |
+
|
23 |
+
|
24 |
+
- **Developed by:** [LAION](https://laion.ai/), [SkunkworksAI](https://huggingface.co/SkunkworksAI) & [Ontocord](https://www.ontocord.ai/)
|
25 |
+
- **Model type:** LLaVA is an open-source chatbot trained by fine-tuning Phi-2 on GPT-generated multimodal instruction-following data.
|
26 |
+
It is an auto-regressive language model, based on the transformer architecture
|
27 |
+
- **Finetuned from model:** [Phi-2](https://huggingface.co/microsoft/phi-2)
|
28 |
+
- **License:** MIT
|
29 |
+
|
30 |
+
### Model Sources
|
31 |
+
|
32 |
+
<!-- Provide the basic links for the model. -->
|
33 |
+
|
34 |
+
- **Repository:** [BakLLaVa](https://github.com/SkunkworksAI/BakLLaVA)
|
35 |
+
- **LLama.cpp:** [GitHub](https://github.com/ggerganov/llama.cpp)
|
36 |
+
|
37 |
+
## Usage
|
38 |
+
|
39 |
+
```
|
40 |
+
make & ./llava-cli -m ../ggml-model-f16.gguf --mmproj ../mmproj-model-f16.gguf --image /path/to/image.jpg
|
41 |
+
```
|
42 |
+
|
43 |
+
## Evaluation
|
44 |
+
|
45 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
46 |
+
|
47 |
+
### Benchmarks
|
48 |
+
|
49 |
+
| Model | Parameters |SQA | GQA | TextVQA | POPE |
|
50 |
+
| --- | --- | --- | --- | --- | --- |
|
51 |
+
| [LLaVA-1.5](https://huggingface.co/liuhaotian/llava-v1.5-7b) | 7.3B | 68.0| **62.0** | **58.3** | 85.3 |
|
52 |
+
| [MC-LLaVA-3B](https://huggingface.co/visheratin/MC-LLaVA-3b) | 3B | - | 49.6 | 38.59 | - |
|
53 |
+
| [LLaVA-Phi](https://arxiv.org/pdf/2401.02330.pdf) | 3B | 68.4 | - | 48.6 | 85.0 |
|
54 |
+
| [moondream1](https://huggingface.co/vikhyatk/moondream1) | 1.6B | - | 56.3 | 39.8 | - |
|
55 |
+
| **llava-phi-2-3b** | 3B | **69.0** | 51.2 | 47.0 | **86.0** |
|
56 |
+
|
57 |
+
### Image Captioning (MS COCO)
|
58 |
+
|
59 |
+
| Model | BLEU_1 | BLEU_2 | BLEU_3 | BLEU_4 | METEOR | ROUGE_L | CIDEr | SPICE |
|
60 |
+
| -------------------------------------------------------- | ------ | ------ | ------ | ------ | ------ | ------- | ----- | ----- |
|
61 |
+
| llava-1.5-7b | 75.8 | 59.8 | 45 | 33.3 | 29.4 | 57.7 | 108.8 | 23.5 |
|
62 |
+
| **llava-phi-2-3b** | 67.7 | 50.5 | 35.7 | 24.2 | 27.0 | 52.4 | 85.0 | 20.7 |
|