File size: 2,224 Bytes
dfc440b 0961fae 92b0011 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae 92b0011 0961fae 92b0011 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae dfc440b 0961fae |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
base_model:
- tokyotech-llm/Llama-3-Swallow-8B-v0.1
- meta-llama/Llama-3.2-11B-Vision-Instruct
- meta-llama/Meta-Llama-3-8B
license: llama3.2
tags:
- merge
---
## Model Information
[Kendamarron/Llama-3.2-11B-Vision-Instruct-Swallow-8B-Merge](https://huggingface.co/Kendamarron/Llama-3.2-11B-Vision-Instruct-Swallow-8B-Merge)の初期バージョンです。
Llama-3.1シリーズの代わりにLlama-3シリーズを使用しています。
Llama-3.1を使用したモデルと体感の出力はあまり変わりません。
### Detail
https://zenn.dev/kendama/articles/280a4089cb8a72
## Recipe
```
Llama-3.2-11B-Vision-Instruct + (Llama-3-Swallow-8B-v0.1 - Meta-Llama-3-8B)
```
- Vision Model: [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
- Base Text Model: [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
- Japanese Text Model: [tokyotech-llm/Llama-3-Swallow-8B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1)
## License
[Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)
## How to use
```python
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor
model_id = "Kendamarron/Llama-3.2-11B-Vision-Instruct-Swallow-8B-Merge-v0.1"
model = MllamaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "この画像で一句詠んでください。"}
]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
image,
input_text,
add_special_tokens=False,
return_tensors="pt"
).to(model.device)
output = model.generate(**inputs, max_new_tokens=30)
print(processor.decode(output[0]))
``` |