huseinzol05 commited on
Commit
298e38a
1 Parent(s): e012254

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -7,3 +7,109 @@ tags: []
7
 
8
  WanDB https://wandb.ai/huseinzol05/vision-tinyllama?workspace=user-huseinzol05
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  WanDB https://wandb.ai/huseinzol05/vision-tinyllama?workspace=user-huseinzol05
9
 
10
+ ## how-to
11
+
12
+ ```python
13
+ from modeling_vision import MM_LLMs, MM_LLMs_Config
14
+ from transformers import AutoTokenizer, AutoProcessor
15
+ from PIL import Image
16
+ import requests
17
+
18
+ def prepare_dataset(messages, images: List[str] = None):
19
+ if images is not None:
20
+ images = [Image.open(f).convert('RGB') for f in images]
21
+ image_output = image_processor(images=images, return_tensors='pt')['pixel_values']
22
+ else:
23
+ image_output = None
24
+
25
+ prompt = tokenizer.apply_chat_template(messages, tokenize = False)
26
+ outputs = tokenizer(
27
+ prompt,
28
+ return_tensors='pt',
29
+ return_overflowing_tokens=False,
30
+ return_length=False)
31
+
32
+ outputs['images'] = image_output
33
+ outputs['image_index'] = torch.tensor([0] * len(outputs['images']))
34
+ outputs['image_starts'] = torch.tensor([tokenizer.convert_tokens_to_ids('<image>')] * len(outputs['images']))
35
+ return outputs
36
+
37
+ model = MM_LLMs.from_pretrained(
38
+ 'mesolitica/malaysian-tinyllama-1.1b-siglip-base-384-vision',
39
+ flash_attention = True,
40
+ dtype = torch.bfloat16,
41
+ torch_dtype = torch.bfloat16
42
+ )
43
+ _ = model.cuda()
44
+
45
+ image_processor = AutoProcessor.from_pretrained('google/siglip-base-patch16-384')
46
+ tokenizer = AutoTokenizer.from_pretrained('mesolitica/malaysian-tinyllama-1.1b-siglip-base-384-vision')
47
+
48
+ with open('Persian-cat-breed.jpg', 'wb') as fopen:
49
+ fopen.write(requests.get('https://cdn.beautifulnara.net/wp-content/uploads/2017/12/10201620/Persian-cat-breed.jpg').content)
50
+
51
+ with open('nasi-goreng-1-23.jpg', 'wb') as fopen:
52
+ fopen.write(requests.get('https://www.jocooks.com/wp-content/uploads/2023/09/nasi-goreng-1-23.jpg').content)
53
+
54
+ messages = [
55
+ {'role': 'user', 'content': '<image> </image> ini gambar apa'},
56
+ ]
57
+ outputs = prepare_dataset(messages, images = ['Persian-cat-breed.jpg'])
58
+ outputs['images'] = outputs['images'].type(model.dtype)
59
+ for k in outputs.keys():
60
+ if outputs[k] is not None:
61
+ outputs[k] = outputs[k].cuda()
62
+
63
+ with torch.no_grad():
64
+ model_inputs = model.prepare_inputs_for_generation(**outputs)
65
+ r = model_inputs.pop('input_ids', None)
66
+
67
+ generate_kwargs = dict(
68
+ model_inputs,
69
+ max_new_tokens=300,
70
+ top_p=0.95,
71
+ top_k=50,
72
+ temperature=0.1,
73
+ do_sample=True,
74
+ num_beams=1,
75
+ )
76
+
77
+ r = model.llm.generate(**generate_kwargs)
78
+ print(tokenizer.decode(r[0]))
79
+ ```
80
+
81
+ ```
82
+ <s>Imej itu menunjukkan seekor kucing putih yang comel duduk di atas sofa hitam.</s>
83
+ ```
84
+
85
+ ```python
86
+ messages = [
87
+ {'role': 'user', 'content': '<image> </image> <image> </image> apa kaitan 2 gambar ni'},
88
+ ]
89
+ outputs = prepare_dataset(messages, images = ['Persian-cat-breed.jpg', 'nasi-goreng-1-23.jpg'])
90
+ outputs['images'] = outputs['images'].type(model.dtype)
91
+ for k in outputs.keys():
92
+ if outputs[k] is not None:
93
+ outputs[k] = outputs[k].cuda()
94
+
95
+ with torch.no_grad():
96
+ model_inputs = model.prepare_inputs_for_generation(**outputs)
97
+ r = model_inputs.pop('input_ids', None)
98
+
99
+ generate_kwargs = dict(
100
+ model_inputs,
101
+ max_new_tokens=300,
102
+ top_p=0.95,
103
+ top_k=50,
104
+ temperature=0.1,
105
+ do_sample=True,
106
+ num_beams=1,
107
+ )
108
+
109
+ r = model.llm.generate(**generate_kwargs)
110
+ print(tokenizer.decode(r[0]))
111
+ ```
112
+
113
+ ```
114
+ <s>Tiada hubungan yang jelas antara gambar 1 (anak kucing putih duduk di atas sofa) dan gambar 2 (foto penutup mangkuk mi telur dengan nasi dan cili). Gambar pertama ialah imej haiwan, manakala gambar kedua ialah imej makanan. Mereka tergolong dalam kategori yang berbeza dan tidak mempunyai hubungan antara satu sama lain.</s>
115
+ ```