File size: 3,271 Bytes
df98191
 
 
1488efa
 
 
bdde6c5
1488efa
 
 
 
 
bdde6c5
1488efa
 
 
 
 
 
 
bdde6c5
1488efa
 
 
 
 
 
 
bdde6c5
1488efa
 
bdde6c5
85a8af7
1488efa
 
 
 
 
 
 
 
 
bdde6c5
 
1488efa
bdde6c5
 
 
1488efa
bdde6c5
 
1488efa
 
bdde6c5
1488efa
bdde6c5
1488efa
bdde6c5
1488efa
 
 
 
bdde6c5
1488efa
 
bdde6c5
1488efa
 
bdde6c5
1488efa
 
bdde6c5
1488efa
 
bdde6c5
1488efa
 
 
bdde6c5
1488efa
 
bdde6c5
1488efa
 
 
 
 
 
bdde6c5
1488efa
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
{}
---
this repo is huggingface version of liuhaotian/llava-v1.6-34b
# Issue
Despite the completion of generation, '\n' is repeatedly generated, so be mindful of adjusting the 'max_length'.

```python
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration

model_id = "PerRing/llava-v1.6-34b-hf"
model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True, 
).to(0)
processor = AutoProcessor.from_pretrained(model_id)

Q='explain about this image'
prompt = f"""<|im_start|>system
Answer the questions.<|im_end|><|im_start|>user
<image>
{Q}<|im_end|><|im_start|>assistant
"""
image_file = "https://images.pexels.com/photos/757889/pexels-photo-757889.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2"

raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_length=256, temperature=0.4, do_sample=True)
print(processor.decode(output[0], skip_special_tokens=True))
```
## result
```output
<|im_start|> system
Answer the questions.<|im_start|> user
 
explain about this image<|im_start|> assistant
The image shows a bouquet of purple flowers arranged in a clear glass vase. The vase is placed on a balcony railing. The balcony railing is made of metal and has a black color. The flowers are purple in color. The bouquet of flowers is placed in the clear glass vase. The vase is made of clear glass. The clear glass vase is placed on the balcony railing. The balcony railing is made of metal and has a black color. The bouquet of purple flowers is placed in the clear glass vase. The vase is made of clear glass.


```



# Original(liuhaotian/llava-v1.6-34b) README.md


<br>
<br>

# LLaVA Model Card

## Model details

**Model type:**
LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data.
It is an auto-regressive language model, based on the transformer architecture.
Base LLM: [NousResearch/Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B)

**Model date:**
LLaVA-v1.6-34B was trained in December 2023.

**Paper or resources for more information:**
https://llava-vl.github.io/

## License
[NousResearch/Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) license.

**Where to send questions or comments about the model:**
https://github.com/haotian-liu/LLaVA/issues

## Intended use
**Primary intended uses:**
The primary use of LLaVA is research on large multimodal models and chatbots.

**Primary intended users:**
The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

## Training dataset
- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
- 158K GPT-generated multimodal instruction-following data.
- 500K academic-task-oriented VQA data mixture.
- 50K GPT-4V data mixture.
- 40K ShareGPT data.

## Evaluation dataset
A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction-following LMMs.