Model Card for Pixtral-Large-Instruct-2411

Pixtral-Large-Instruct-2411 is a 124B multimodal model built on top of Mistral Large 2, i.e., Mistral-Large-Instruct-2407. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.

For more details about this model please refer to the Pixtral Large blog post and the Pixtral 12B blog post.

Key features

Frontier-class multimodal performance
State-of-the-art on MathVista, DocVQA, VQAv2
Extends Mistral Large 2 without compromising text performance
123B multimodal decoder, 1B parameter vision encoder
128K context window: fits minimum of 30 high-resolution images

System Prompt Handling

We appreciate the feedback received from our community regarding our system prompt handling.
In response, we have implemented stronger support for system prompts.
To achieve optimal results, we recommend always including a system prompt that clearly outlines the bot's purpose, even if it is minimal.

Basic Instruct Template (V7)

<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT][INST] <user message>[/INST] <assistant response></s>[INST] <user message>[/INST]

Be careful with subtle missing or trailing white spaces!

Please make sure to use mistral-common as the source of truth

Metrics

Model	MathVista (CoT)	MMMU (CoT)	ChartQA (CoT)	DocVQA (ANLS)	VQAv2 (VQA Match)	AI2D (BBox)	MM MT-Bench
Pixtral Large (124B)	69.4	64.0	88.1	93.3	80.9	93.8	7.4
Gemini-1.5 Pro (measured)	67.8	66.3	83.8	92.3	70.6	94.6	6.8
GPT-4o (measured)	65.4	68.6	85.2	88.5	76.4	93.2	6.7
Claude-3.5 Sonnet (measured)	67.1	68.4	89.1	88.6	69.5	76.9	7.3
Llama-3.2 90B (measured)	49.1	53.7	70.8	85.7	67.0	-	5.5

Specific model versions evaluated: Claude-3.5 Sonnet (new) [Oct 24], Gemini-1.5 Pro (002) [Sep 24], GPT-4o (2024-08-06) [Aug 24].

See mistral-evals for open-source MM MT-Bench evaluation scripts.

Usage

The model can be used with the following frameworks

vllm: See here

vLLM

We recommend using Pixtral-Large-Instruct-2411 with the vLLM library to implement production-ready inference pipelines with Pixtral-Large-Instruct-2411.

Installation

Make sure you install vLLM >= v0.6.4.post1:

pip install --upgrade vllm

Also make sure you have mistral_common >= 1.5.0 installed:

pip install --upgrade mistral_common

You can also make use of a ready-to-go docker image or on the docker hub.

Server (Image)

We recommend to use Pixtral-Large-Instruct-2411 in a server/client setting.

Spin up a server:

vllm serve mistralai/Pixtral-Large-Instruct-2411 --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8

And ping the client:

import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Pixtral-Large-Instruct-2411"


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]

data = {"model": model, "messages": messages}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
# Determining which country has the "best" food can be subjective and depends on personal preferences. However, based on popular culinary reputations, here are some countries known for their cuisine:

#1. **Italy** (Brown) - Known for its pasta, pizza, and diverse regional dishes.
#   - City: Milan

#2. **France** (Dark Brown) - Renowned for its fine dining, pastries, and wine.
#   - City: Lyon

#3. **Spain** (Yellow) - Famous for tapas, paella, and a variety of seafood dishes.
#   - City: Barcelona

#4. **Greece** (Yellow) - Known for its Mediterranean cuisine, including moussaka, souvlaki, and fresh seafood.
#   - City: Thessaloniki

#These rankings are based on general culinary reputations and can vary widely depending on individual tastes.

Server (Text-only)

You can also ping the client with a text-only example. The following example shows how the system prompt can be used to make sure the model always knows the current date.

import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Pixtral-Large-Instruct-2411"


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": "Without browsing the web, how many days ago was Mistral founded?"
    },
]

data = {"model": model, "messages": messages}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
# Mistral AI was founded in April 2023. Since the current date is November 18, 2024, we can calculate the number of days between April 2023 and November 18, 2024.

#First, calculate the days from April 2023 to the end of 2023:
#- April: 27 days (30 - 3)
#- May: 31 days
#- June: 30 days
#- July: 31 days
#- August: 31 days
#- September: 30 days
#- October: 31 days
#- November: 30 days
#- December: 31 days

#Total days from April 2023 to December 31, 2023: 27 + 31 + 30 + 31 + 31 + 30 + 31 + 30 + 31 = 272 days

#Next, calculate the days from January 1, 2024, to November 18, 2024:
#- January: 31 days
#- February: 29 days (2024 is a leap year)
#- March: 31 days
#- April: 30 days
#- May: 31 days
#- June: 30 days
#- July: 31 days
#- August: 31 days
#- September: 30 days
#- October: 31 days
#- November: 18 days

#Total days from January 1, 2024, to November 18, 2024: 31 + 29 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 18 = 323 days

#Adding the two periods together:
#272 days (from April 2023 to December 2023) + 323 days (from January 2024 to November 18, 2024) = 595 days

#Therefore, Mistral AI was founded 595 days ago from November 18, 2024.

Offline Example

from vllm import LLM
from vllm.sampling_params import SamplingParams
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

model_name = "mistralai/Pixtral-Large-Instruct-2411"

def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, 'r') as file:
        system_prompt = file.read()
    today = datetime.today().strftime('%Y-%m-%d')
    yesterday = (datetime.today() - timedelta(days=1)).strftime('%Y-%m-%d')
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)

SYSTEM_PROMPT = load_system_prompt(model_name, "SYSTEM_PROMPT.txt")

image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]

sampling_params = SamplingParams(max_tokens=512)

# note that running this model on GPU requires over 300 GB of GPU RAM
llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8, limit_mm_per_prompt={"image": 4})

outputs = llm.chat(messages, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

The Mistral AI Team

Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall