dillonlaird
/

hf-llava-v1.6-34b

Text Generation

Model card Files Files and versions Community

hf-llava-v1.6-34b / README.md

Yazhou Cao

added example in README

8399353 5 months ago

|

raw history blame contribute delete

No virus

2.8 kB

	---
	inference: false
	license: apache-2.0
	---

	<br>
	<br>

	# LLaVA Model Card

	## SGLang
	This contains the necessary files to run LLaVA-1.6 34B on SGLang. You can run the server with the following command:

	`python -m sglang.launch_server --model-path dillonlaird/hf-llava-v1.6-34b --port 30000`

	There seems to be issues with the chat formatting when using the sglang interface so I recommend querying the server directly and formatting the string yourself:

	```python
	import requests
	from transformers import AutoTokenizer


	def generate(image_path: str, prompt: str, tokenizer):
	chat = [
	{"role": "system", "content": "Answer the question."},
	{"role": "user", "content": "<image>\n" + prompt},
	]
	chat_str = tokenizer.apply_chat_template(chat, tokenize=False)
	chat_str += "<\|img_start\|>assistant\n"
	sampling_params = {"temperature": 0.2, "max_new_tokens": 1536}
	res = requests.post(
	"http://localhost:30000/generate",
	json={
	"text": chat_str,
	"image_data": image_path,
	"sampling_params": sampling_params,
	},
	)
	return res.json()["text"]


	if __name__ == "__main__":
	tokenizer = AutoTokenizer.from_pretrained("liuhaotian/llava-v1.6-34b")
	image_path = "path/to/image.jpg"
	prompt = "What is the name of the mountain?"
	desc = generate(image_path, prompt, tokenizer)
	```

	## Model details

	Model type:
	LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data.
	It is an auto-regressive language model, based on the transformer architecture.
	Base LLM: [NousResearch/Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B)

	Model date:
	LLaVA-v1.6-34B was trained in December 2023.

	Paper or resources for more information:
	https://llava-vl.github.io/

	## License
	[NousResearch/Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) license.

	Where to send questions or comments about the model:
	https://github.com/haotian-liu/LLaVA/issues

	## Intended use
	Primary intended uses:
	The primary use of LLaVA is research on large multimodal models and chatbots.

	Primary intended users:
	The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

	## Training dataset
	- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
	- 158K GPT-generated multimodal instruction-following data.
	- 500K academic-task-oriented VQA data mixture.
	- 50K GPT-4V data mixture.
	- 40K ShareGPT data.

	## Evaluation dataset
	A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction-following LMMs.