avans06
/

Meta-Llama-3.2-8B-Instruct-ct2-int8_float16

@@ -9,6 +9,7 @@ language:
 - es
 - th
 library_name: transformers
 pipeline_tag: image-text-to-text
 tags:
 - facebook
@@ -16,6 +17,10 @@ tags:
 - pytorch
 - llama
 - llama-3
 widget:
   - example_title: rococo art
     messages:
@@ -277,6 +282,20 @@ extra_gated_button_content: Submit
 extra_gated_eu_disallowed: true
 ---
 ## Model Information
 The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text \+ images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.
@@ -321,58 +340,37 @@ The Llama 3.2 model collection also supports the ability to leverage the outputs
 ## How to use
-This repository contains two versions of Llama-3.2-11B-Vision-Instruct, for use with transformers and with the original `llama` codebase.
-### Use with transformers
-Starting with transformers >= 4.45.0 onward, you can run inference using conversational messages that may include an image you can query about.
-Make sure to update your transformers installation via `pip install --upgrade transformers`.
 ```python
-import requests
-import torch
-from PIL import Image
-from transformers import MllamaForConditionalGeneration, AutoProcessor
-model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
-model = MllamaForConditionalGeneration.from_pretrained(
-    model_id,
-    torch_dtype=torch.bfloat16,
-    device_map="auto",
-)
-processor = AutoProcessor.from_pretrained(model_id)
-url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
-image = Image.open(requests.get(url, stream=True).raw)
 messages = [
-    {"role": "user", "content": [
-        {"type": "image"},
-        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
-    ]}
 ]
-input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
-inputs = processor(
-    image,
-    input_text,
-    add_special_tokens=False,
-    return_tensors="pt"
-).to(model.device)
-output = model.generate(**inputs, max_new_tokens=30)
-print(processor.decode(output[0]))
-```
-### Use with `llama`
-Please, follow the instructions in the [repository](https://github.com/meta-llama/llama).
-To download the original checkpoints, you can use `huggingface-cli` as follows:
-```
-huggingface-cli download meta-llama/Llama-3.2-11B-Vision-Instruct --include "original/*" --local-dir Llama-3.2-11B-Vision-Instruct
 ```
 ## Hardware and Software

 - es
 - th
 library_name: transformers
+base_model: meta-llama/Llama-3.2-11B-Vision-Instruct
 pipeline_tag: image-text-to-text
 tags:
 - facebook
 - pytorch
 - llama
 - llama-3
+- ctranslate2
+- quantization
+- int8
+- float16
 widget:
   - example_title: rococo art
     messages:
 extra_gated_eu_disallowed: true
 ---
+## meta-llama/Llama-3.2-11B-Vision-Instruct for CTranslate2
+This model is derived from [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) multimodal model by removing the Vision layer and converting it into a format supported by [CTranslate2](https://github.com/OpenNMT/CTranslate2). After conversion, it becomes a **text-only model**.
+**The model is quantized version of the [avans06/Meta-Llama-3.2-8B-Instruct](https://huggingface.co/avans06/Meta-Llama-3.2-8B-Instruct) with int8_float16 quantization and can be used in CTranslate2.**
+## Conversion details
+The original model was converted on 2024-10 with the following command:
+```
+ct2-transformers-converter --model Path\To\Local\avans06\Meta-Llama-3.2-8B-Instruct \
+    --quantization int8_float16 --output_dir Meta-Llama-3.2-8B-Instruct-ct2-int8_float16
+```
 ## Model Information
 The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text \+ images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.
 ## How to use
+This repository for use with [CTranslate2](https://github.com/OpenNMT/CTranslate2).
+### Use with CTranslate2
+This example code is obtained from [CTranslate2_transformers](https://opennmt.net/CTranslate2/guides/transformers.html#mpt) and [tokenizer AutoTokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer).
+More detailed information about the `generate_batch` methon can be found at [CTranslate2_Generator.generate_batch](https://opennmt.net/CTranslate2/python/ctranslate2.Generator.html#ctranslate2.Generator.generate_batch).
 ```python
+import ctranslate2
+import transformers
+model_id = "avans06/Meta-Llama-3.2-8B-Instruct-ct2-int8_float16"
+model = ctranslate2.Generator(model_id, device="auto", compute_type="int8_float16")
+tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
 messages = [
+    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
+    {"role": "user", "content": "Who are you?"},
 ]
+input_ids = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True)
+)
+input_tokens = tokenizer.convert_ids_to_tokens(input_ids)
+results = model.generate_batch([input_tokens], include_prompt_in_result=False, max_length=256)
+output = tokenizer.decode(results[0].sequences_ids[0])
+print(output)
 ```
 ## Hardware and Software