README.md · meta-llama/Llama-3.2-11B-Vision at refs/pr/3

Inference with transformers

Please, install the in-progress development wheel from https://huggingface.co/nltpt/transformers/tree/main.

This is an example inference snippet (API subject to change):

import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "nltpt/Llama-3.2-11B-Vision"
model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(model_id)

prompt = "<|image|><|begin_of_text|>If I had to write a haiku for this one"
url = "https://llava-vl.github.io/static/images/view.jpg"
raw_image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=raw_image, return_tensors="pt").to(model.device)
output = model.generate(**inputs, do_sample=False, max_new_tokens=25)
print(processor.decode(output[0], skip_special_tokens=True))

Output:

If I had to write a haiku for this one, it would be:.\nA dock on a lake.\nA mountain in the distance.\nA long exposure.\

Running the original checkpoints

The package installed will provide three binaries:

example_chat_completion
example_text_completion
multimodal_example_chat_completion You can invoke them via torchrun by doing the following:

CHECKPOINT_DIR=~/.llama/checkpoints/Llama-3.2-11B-Vision/

torchrun `which multimodal_example_chat_completion` "$CHECKPOINT_DIR"

You can study the code for the script by doing something like:

PACKAGE_DIR=$(pip show -f llama-models | grep Location | awk '{ print $2 }')

echo "Scripts are in the directory: $PACKAGE_DIR/llama-models/scripts/"