Kendamarron commited on
Commit
8a23c0b
·
verified ·
1 Parent(s): c810edd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: llama3.2
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.2
3
+ ---
4
+
5
+ ## Model Information
6
+
7
+ Llama-3.2-11B-Vision-Instruct-Swallow-8B-Merge was created using Chat Vector to add Japanese language capability to Meta/Llama-3.2-11B-Vision-Instruct.
8
+
9
+ Llama-3.2-11B-Vision-Instruct-Swallow-8B-Mergeは、Meta/Llama-3.2-11B-Vision-Instructに日本語能力を付加するためにChat Vectorを用いて作成されました。
10
+
11
+ ### Detail
12
+
13
+ https://zenn.dev/kendama/articles/280a4089cb8a72
14
+
15
+ ## Recipe
16
+ ```
17
+ Llama-3.2-11B-Vision-Instruct + (Llama-3.1-Swallow-8B-v0.1 - Llama-3.1-8B)
18
+ ```
19
+ - Vision Model: [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
20
+ - Base Text Model: [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
21
+ - Japanese Text Model: [tokyotech-llm/Llama-3.1-Swallow-8B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3.1-Swallow-8B-v0.1)
22
+
23
+ ## License
24
+
25
+ [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)
26
+
27
+ ## How to use
28
+
29
+ ```python
30
+ import requests
31
+ import torch
32
+ from PIL import Image
33
+ from transformers import MllamaForConditionalGeneration, AutoProcessor
34
+
35
+ model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
36
+
37
+ model = MllamaForConditionalGeneration.from_pretrained(
38
+ model_id,
39
+ torch_dtype=torch.bfloat16,
40
+ device_map="auto",
41
+ )
42
+ processor = AutoProcessor.from_pretrained(model_id)
43
+
44
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
45
+ image = Image.open(requests.get(url, stream=True).raw)
46
+
47
+ messages = [
48
+ {"role": "user", "content": [
49
+ {"type": "image"},
50
+ {"type": "text", "text": "この画像で一句詠んでください。"}
51
+ ]}
52
+ ]
53
+ input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
54
+ inputs = processor(
55
+ image,
56
+ input_text,
57
+ add_special_tokens=False,
58
+ return_tensors="pt"
59
+ ).to(model.device)
60
+
61
+ output = model.generate(**inputs, max_new_tokens=30)
62
+ print(processor.decode(output[0]))
63
+ ```