avans06 commited on
Commit
bb486d4
1 Parent(s): 07d3aa6

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -40
README.md CHANGED
@@ -9,6 +9,7 @@ language:
9
  - es
10
  - th
11
  library_name: transformers
 
12
  pipeline_tag: image-text-to-text
13
  tags:
14
  - facebook
@@ -16,6 +17,10 @@ tags:
16
  - pytorch
17
  - llama
18
  - llama-3
 
 
 
 
19
  widget:
20
  - example_title: rococo art
21
  messages:
@@ -277,6 +282,20 @@ extra_gated_button_content: Submit
277
  extra_gated_eu_disallowed: true
278
  ---
279
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
280
  ## Model Information
281
 
282
  The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text \+ images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.
@@ -321,58 +340,37 @@ The Llama 3.2 model collection also supports the ability to leverage the outputs
321
 
322
  ## How to use
323
 
324
- This repository contains two versions of Llama-3.2-11B-Vision-Instruct, for use with transformers and with the original `llama` codebase.
325
 
326
- ### Use with transformers
327
 
328
- Starting with transformers >= 4.45.0 onward, you can run inference using conversational messages that may include an image you can query about.
329
-
330
- Make sure to update your transformers installation via `pip install --upgrade transformers`.
331
 
332
  ```python
333
- import requests
334
- import torch
335
- from PIL import Image
336
- from transformers import MllamaForConditionalGeneration, AutoProcessor
337
-
338
- model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
339
-
340
- model = MllamaForConditionalGeneration.from_pretrained(
341
- model_id,
342
- torch_dtype=torch.bfloat16,
343
- device_map="auto",
344
- )
345
- processor = AutoProcessor.from_pretrained(model_id)
346
 
347
- url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
348
- image = Image.open(requests.get(url, stream=True).raw)
 
349
 
350
  messages = [
351
- {"role": "user", "content": [
352
- {"type": "image"},
353
- {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
354
- ]}
355
  ]
356
- input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
357
- inputs = processor(
358
- image,
359
- input_text,
360
- add_special_tokens=False,
361
- return_tensors="pt"
362
- ).to(model.device)
363
-
364
- output = model.generate(**inputs, max_new_tokens=30)
365
- print(processor.decode(output[0]))
366
- ```
367
 
368
- ### Use with `llama`
 
 
 
369
 
370
- Please, follow the instructions in the [repository](https://github.com/meta-llama/llama).
371
 
372
- To download the original checkpoints, you can use `huggingface-cli` as follows:
 
373
 
374
- ```
375
- huggingface-cli download meta-llama/Llama-3.2-11B-Vision-Instruct --include "original/*" --local-dir Llama-3.2-11B-Vision-Instruct
376
  ```
377
 
378
  ## Hardware and Software
 
9
  - es
10
  - th
11
  library_name: transformers
12
+ base_model: meta-llama/Llama-3.2-11B-Vision-Instruct
13
  pipeline_tag: image-text-to-text
14
  tags:
15
  - facebook
 
17
  - pytorch
18
  - llama
19
  - llama-3
20
+ - ctranslate2
21
+ - quantization
22
+ - int8
23
+ - float16
24
  widget:
25
  - example_title: rococo art
26
  messages:
 
282
  extra_gated_eu_disallowed: true
283
  ---
284
 
285
+ ## meta-llama/Llama-3.2-11B-Vision-Instruct for CTranslate2
286
+
287
+ This model is derived from [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) multimodal model by removing the Vision layer and converting it into a format supported by [CTranslate2](https://github.com/OpenNMT/CTranslate2). After conversion, it becomes a **text-only model**.
288
+
289
+ **The model is quantized version of the [avans06/Meta-Llama-3.2-8B-Instruct](https://huggingface.co/avans06/Meta-Llama-3.2-8B-Instruct) with int8_float16 quantization and can be used in CTranslate2.**
290
+
291
+ ## Conversion details
292
+
293
+ The original model was converted on 2024-10 with the following command:
294
+ ```
295
+ ct2-transformers-converter --model Path\To\Local\avans06\Meta-Llama-3.2-8B-Instruct \
296
+ --quantization int8_float16 --output_dir Meta-Llama-3.2-8B-Instruct-ct2-int8_float16
297
+ ```
298
+
299
  ## Model Information
300
 
301
  The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text \+ images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.
 
340
 
341
  ## How to use
342
 
343
+ This repository for use with [CTranslate2](https://github.com/OpenNMT/CTranslate2).
344
 
345
+ ### Use with CTranslate2
346
 
347
+ This example code is obtained from [CTranslate2_transformers](https://opennmt.net/CTranslate2/guides/transformers.html#mpt) and [tokenizer AutoTokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer).
348
+ More detailed information about the `generate_batch` methon can be found at [CTranslate2_Generator.generate_batch](https://opennmt.net/CTranslate2/python/ctranslate2.Generator.html#ctranslate2.Generator.generate_batch).
 
349
 
350
  ```python
351
+ import ctranslate2
352
+ import transformers
 
 
 
 
 
 
 
 
 
 
 
353
 
354
+ model_id = "avans06/Meta-Llama-3.2-8B-Instruct-ct2-int8_float16"
355
+ model = ctranslate2.Generator(model_id, device="auto", compute_type="int8_float16")
356
+ tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
357
 
358
  messages = [
359
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
360
+ {"role": "user", "content": "Who are you?"},
 
 
361
  ]
 
 
 
 
 
 
 
 
 
 
 
362
 
363
+ input_ids = tokenizer.apply_chat_template(
364
+ messages,
365
+ add_generation_prompt=True)
366
+ )
367
 
368
+ input_tokens = tokenizer.convert_ids_to_tokens(input_ids)
369
 
370
+ results = model.generate_batch([input_tokens], include_prompt_in_result=False, max_length=256)
371
+ output = tokenizer.decode(results[0].sequences_ids[0])
372
 
373
+ print(output)
 
374
  ```
375
 
376
  ## Hardware and Software