Upload README.md
Browse files
README.md
CHANGED
@@ -9,6 +9,7 @@ language:
|
|
9 |
- es
|
10 |
- th
|
11 |
library_name: transformers
|
|
|
12 |
pipeline_tag: image-text-to-text
|
13 |
tags:
|
14 |
- facebook
|
@@ -16,6 +17,10 @@ tags:
|
|
16 |
- pytorch
|
17 |
- llama
|
18 |
- llama-3
|
|
|
|
|
|
|
|
|
19 |
widget:
|
20 |
- example_title: rococo art
|
21 |
messages:
|
@@ -277,6 +282,20 @@ extra_gated_button_content: Submit
|
|
277 |
extra_gated_eu_disallowed: true
|
278 |
---
|
279 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
280 |
## Model Information
|
281 |
|
282 |
The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text \+ images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.
|
@@ -321,58 +340,37 @@ The Llama 3.2 model collection also supports the ability to leverage the outputs
|
|
321 |
|
322 |
## How to use
|
323 |
|
324 |
-
This repository
|
325 |
|
326 |
-
### Use with
|
327 |
|
328 |
-
|
329 |
-
|
330 |
-
Make sure to update your transformers installation via `pip install --upgrade transformers`.
|
331 |
|
332 |
```python
|
333 |
-
import
|
334 |
-
import
|
335 |
-
from PIL import Image
|
336 |
-
from transformers import MllamaForConditionalGeneration, AutoProcessor
|
337 |
-
|
338 |
-
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
|
339 |
-
|
340 |
-
model = MllamaForConditionalGeneration.from_pretrained(
|
341 |
-
model_id,
|
342 |
-
torch_dtype=torch.bfloat16,
|
343 |
-
device_map="auto",
|
344 |
-
)
|
345 |
-
processor = AutoProcessor.from_pretrained(model_id)
|
346 |
|
347 |
-
|
348 |
-
|
|
|
349 |
|
350 |
messages = [
|
351 |
-
{"role": "
|
352 |
-
|
353 |
-
{"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
|
354 |
-
]}
|
355 |
]
|
356 |
-
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
|
357 |
-
inputs = processor(
|
358 |
-
image,
|
359 |
-
input_text,
|
360 |
-
add_special_tokens=False,
|
361 |
-
return_tensors="pt"
|
362 |
-
).to(model.device)
|
363 |
-
|
364 |
-
output = model.generate(**inputs, max_new_tokens=30)
|
365 |
-
print(processor.decode(output[0]))
|
366 |
-
```
|
367 |
|
368 |
-
|
|
|
|
|
|
|
369 |
|
370 |
-
|
371 |
|
372 |
-
|
|
|
373 |
|
374 |
-
|
375 |
-
huggingface-cli download meta-llama/Llama-3.2-11B-Vision-Instruct --include "original/*" --local-dir Llama-3.2-11B-Vision-Instruct
|
376 |
```
|
377 |
|
378 |
## Hardware and Software
|
|
|
9 |
- es
|
10 |
- th
|
11 |
library_name: transformers
|
12 |
+
base_model: meta-llama/Llama-3.2-11B-Vision-Instruct
|
13 |
pipeline_tag: image-text-to-text
|
14 |
tags:
|
15 |
- facebook
|
|
|
17 |
- pytorch
|
18 |
- llama
|
19 |
- llama-3
|
20 |
+
- ctranslate2
|
21 |
+
- quantization
|
22 |
+
- int8
|
23 |
+
- float16
|
24 |
widget:
|
25 |
- example_title: rococo art
|
26 |
messages:
|
|
|
282 |
extra_gated_eu_disallowed: true
|
283 |
---
|
284 |
|
285 |
+
## meta-llama/Llama-3.2-11B-Vision-Instruct for CTranslate2
|
286 |
+
|
287 |
+
This model is derived from [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) multimodal model by removing the Vision layer and converting it into a format supported by [CTranslate2](https://github.com/OpenNMT/CTranslate2). After conversion, it becomes a **text-only model**.
|
288 |
+
|
289 |
+
**The model is quantized version of the [avans06/Meta-Llama-3.2-8B-Instruct](https://huggingface.co/avans06/Meta-Llama-3.2-8B-Instruct) with int8_float16 quantization and can be used in CTranslate2.**
|
290 |
+
|
291 |
+
## Conversion details
|
292 |
+
|
293 |
+
The original model was converted on 2024-10 with the following command:
|
294 |
+
```
|
295 |
+
ct2-transformers-converter --model Path\To\Local\avans06\Meta-Llama-3.2-8B-Instruct \
|
296 |
+
--quantization int8_float16 --output_dir Meta-Llama-3.2-8B-Instruct-ct2-int8_float16
|
297 |
+
```
|
298 |
+
|
299 |
## Model Information
|
300 |
|
301 |
The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text \+ images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.
|
|
|
340 |
|
341 |
## How to use
|
342 |
|
343 |
+
This repository for use with [CTranslate2](https://github.com/OpenNMT/CTranslate2).
|
344 |
|
345 |
+
### Use with CTranslate2
|
346 |
|
347 |
+
This example code is obtained from [CTranslate2_transformers](https://opennmt.net/CTranslate2/guides/transformers.html#mpt) and [tokenizer AutoTokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer).
|
348 |
+
More detailed information about the `generate_batch` methon can be found at [CTranslate2_Generator.generate_batch](https://opennmt.net/CTranslate2/python/ctranslate2.Generator.html#ctranslate2.Generator.generate_batch).
|
|
|
349 |
|
350 |
```python
|
351 |
+
import ctranslate2
|
352 |
+
import transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
353 |
|
354 |
+
model_id = "avans06/Meta-Llama-3.2-8B-Instruct-ct2-int8_float16"
|
355 |
+
model = ctranslate2.Generator(model_id, device="auto", compute_type="int8_float16")
|
356 |
+
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
|
357 |
|
358 |
messages = [
|
359 |
+
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
|
360 |
+
{"role": "user", "content": "Who are you?"},
|
|
|
|
|
361 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
362 |
|
363 |
+
input_ids = tokenizer.apply_chat_template(
|
364 |
+
messages,
|
365 |
+
add_generation_prompt=True)
|
366 |
+
)
|
367 |
|
368 |
+
input_tokens = tokenizer.convert_ids_to_tokens(input_ids)
|
369 |
|
370 |
+
results = model.generate_batch([input_tokens], include_prompt_in_result=False, max_length=256)
|
371 |
+
output = tokenizer.decode(results[0].sequences_ids[0])
|
372 |
|
373 |
+
print(output)
|
|
|
374 |
```
|
375 |
|
376 |
## Hardware and Software
|