tinyllava
/

TinyLLaVA-Phi-2-SigLIP-3.1B

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

TinyLLaVA-Phi-2-SigLIP-3.1B / README.md

tinyllava's picture

Update README.md

b4ca6f9 verified 4 months ago

|

No virus

2.45 kB

	---
	license: apache-2.0
	pipeline_tag: image-text-to-text
	---

	### TinyLLaVA

	We trained a TinyLLaVA model with 3.1B parameters, employing the same training settings as [TinyLLaVA](https://github.com/DLCV-BUAA/TinyLLaVABench). For the Language and Vision models, we chose [Phi-2](microsoft/phi-2) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384), respectively. The Connector was configured with a 2-layer MLP. The dataset used for training is the [ShareGPT4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md) dataset.

	### Usage

	1. you need to download the generate file "generate_model.py".
	2. running the following command:
	```bash
	python generate_model --model tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B --prompt 'you want to ask' --image '/path/to/related/image'
	```
	or execute the following test code:
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from generate_model import *

	hf_path = 'tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B'
	model = AutoModelForCausalLM.from_pretrained(hf_path, trust_remote_code=True)
	config = model.config
	tokenizer = AutoTokenizer.from_pretrained(hf_path, use_fast=False, model_max_length = config.tokenizer_model_max_length,padding_side = config.tokenizer_padding_side)
	prompt="you want to ask"
	image="/path/to/related/image"
	output_text, genertaion_time = generate(prompt=prompt, image=image, model=model, tokenizer=tokenizer)
	print_txt = (
	f'\r\n{"=" * os.get_terminal_size().columns}\r\n'
	'\033[1m Prompt + Generated Output\033[0m\r\n'
	f'{"-" * os.get_terminal_size().columns}\r\n'
	f'{output_text}\r\n'
	f'{"-" * os.get_terminal_size().columns}\r\n'
	'\r\nGeneration took'
	f'\033[1m\033[92m {round(genertaion_time, 2)} \033[0m'
	'seconds.\r\n'
	)
	print(print_txt)
	```
	### Result

	\| model_name \| vqav2 \| gqa \| sqa \| textvqa \| MM-VET \| POPE \| MME \| MMMU \|
	\| :----------------------------------------------------------: \| ----- \| ------- \| ----- \| ----- \| ------- \| ----- \| ------ \| ------ \|
	\| [bczhou/TinyLLaVA-3.1B](https://huggingface.co/bczhou/TinyLLaVA-3.1B) \| 79.9 \| 62.0 \| 69.1 \| 59.1 \| 32.0 \| 86.4 \| 1464.9 \| - \|
	\| [tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B](https://huggingface.co/tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B) \| 80.1 \| 62.1 \| 73.0 \| 60.3 \| 37.5 \| 87.2 \| 1466.4 \| 38.4 \|