llava-hf
/

llava-v1.6-vicuna-13b-hf

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

RaushanTurganbay HF staff commited on Jul 19

Commit

1d94039

•

1 Parent(s): 81cbb4d

Update README.md

Files changed (1) hide show

README.md +19 -1

README.md CHANGED Viewed

@@ -2,6 +2,10 @@
 tags:
 - vision
 - image-text-to-text
 ---
 # LLaVa-Next, leveraging [liuhaotian/llava-v1.6-vicuna-13b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-13b) as LLM
@@ -29,6 +33,7 @@ Here's the prompt template for this model:
 ```
 "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image>\nWhat is shown in this image? ASSISTANT:"
 ```
 You can load and use the model like following:
 ```python
 from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
@@ -44,7 +49,20 @@ model.to("cuda:0")
 # prepare image and text prompt, using the appropriate prompt template
 url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
 image = Image.open(requests.get(url, stream=True).raw)
-prompt = "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image>\nWhat is shown in this image? ASSISTANT:"
 inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")

 tags:
 - vision
 - image-text-to-text
+license: llama2
+language:
+- en
+pipeline_tag: image-text-to-text
 ---
 # LLaVa-Next, leveraging [liuhaotian/llava-v1.6-vicuna-13b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-13b) as LLM
 ```
 "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image>\nWhat is shown in this image? ASSISTANT:"
 ```
 You can load and use the model like following:
 ```python
 from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
 # prepare image and text prompt, using the appropriate prompt template
 url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
 image = Image.open(requests.get(url, stream=True).raw)
+# Define a chat histiry and use `apply_chat_template` to get correctly formatted prompt
+# Each value in "content" has to be a list of dicts with types ("text", "image")
+conversation = [
+    {
+      "role": "user",
+      "content": [
+          {"type": "text", "text": "What is shown in this image?"},
+          {"type": "image"},
+        ],
+    },
+]
+prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
 inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")