README.md · meta-llama/Llama-3.2-11B-Vision at refs/pr/8

metadata

language:
  - en
  - de
  - fr
  - it
  - pt
  - hi
  - es
  - th
library_name: transformers
pipeline_tag: image-text-to-text
tags:
  - facebook
  - meta
  - pytorch
  - llama
  - llama-3

This repository is an pre-release checkpoint for Llama 3.2 11B Vision.

It contains two versions of the model, for use with transformers and with the original llama3 codebase (under the original directory).

Inference with transformers

Please, install the in-progress development wheel from https://huggingface.co/nltpt/transformers/tree/main.

This is an example inference snippet (API subject to change):

import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "nltpt/Llama-3.2-11B-Vision"
model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(model_id)

prompt = "<|image|><|begin_of_text|>If I had to write a haiku for this one"
url = "https://llava-vl.github.io/static/images/view.jpg"
raw_image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=raw_image, return_tensors="pt").to(model.device)
output = model.generate(**inputs, do_sample=False, max_new_tokens=25)
print(processor.decode(output[0], skip_special_tokens=True))

Output:

If I had to write a haiku for this one, it would be:.\nA dock on a lake.\nA mountain in the distance.\nA long exposure.\

Running the original checkpoints

The package installed will provide three binaries:

example_chat_completion
example_text_completion
multimodal_example_chat_completion

You can invoke them via torchrun by doing the following:

CHECKPOINT_DIR=~/.llama/checkpoints/Llama-3.2-11B-Vision/

torchrun `which multimodal_example_chat_completion` "$CHECKPOINT_DIR"

You can study the code for the script by doing something like:

PACKAGE_DIR=$(pip show -f llama-models | grep Location | awk '{ print $2 }')

echo "Scripts are in the directory: $PACKAGE_DIR/llama-models/scripts/"