|
--- |
|
language: |
|
- en |
|
- fr |
|
- de |
|
- es |
|
- it |
|
- pt |
|
- zh |
|
- ja |
|
- ru |
|
- ko |
|
license: other |
|
license_name: mrl |
|
base_model: mistralai/Pixtral-Large-Instruct-2411 |
|
base_model_relation: quantized |
|
inference: false |
|
license_link: https://mistral.ai/licenses/MRL-0.1.md |
|
library_name: transformers |
|
pipeline_tag: image-text-to-text |
|
--- |
|
|
|
# Pixtral-Large-Instruct-2411 🧡 ExLlamaV2 2.0bpw Quant |
|
|
|
2.0bpw quant of [Pixtral-Large-Instruct](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411). |
|
|
|
Vision inputs working on dev branch of [ExLlamaV2](https://github.com/turboderp/exllamav2/tree/dev). |
|
|
|
***21 Dec 2024:** This model has been a LOT of fun to experiment and learn with. Model card updated below with changes made to this repo |
|
over the last week.* |
|
|
|
## Architecture Differences to Pixtral 12B |
|
Pixtral 12B has bias keys for the multi_modal_projector layers, whereas Pixtral Large does not. Instead of including with low/zero values |
|
this conversion does not include those bias keys, aligning with the keys present in the original Pixtral Large upload from Mistral. The |
|
model's config.json file includes `"multimodal_projector_bias": false` to flag this. *n.b. If anyone in the community confirms initializing |
|
these keys with zero values is the better way to go I'm happy to reupload without them excluded.* |
|
|
|
## Tokenizer |
|
This model uses a conversion of the Mistral v7m1 tokenizer. Pixtral 12B and Large use different tokenizers with different vocab sizes, |
|
so make sure you use the right tokenizer. |
|
|
|
## Prompting / Chat Template |
|
The included chat_template.json supports all of Mistral's defined features with some of my own additions. |
|
|
|
I believe this implementation should give quite a lot of flexibility for using the model, and in my testing has worked quite well. |
|
|
|
Example *(line breaks added for readability)* |
|
``` |
|
<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT] |
|
[INST] [IMG]<user message> |
|
[AVAILABLE_TOOLS] [<tool definitions>][/AVAILABLE_TOOLS][/INST] |
|
[IMG]<assistant response> |
|
[TOOL_CALLS] [<tool calls>][/TOOL_CALLS] |
|
[TOOL_RESULTS] <tool results including images>[/TOOL_RESULTS] |
|
</s>[INST] <user message>[/INST] |
|
``` |
|
|
|
**System Prompts**: |
|
Messages with role "system" will be parsed as `[SYSTEM_PROMPT] <content>[/SYSTEM_PROMPT]` anywhere they appear in chat history. |
|
|
|
This appears to work pretty well for passing extra instructions at various depths, and keeps instructions separate from conversation. |
|
|
|
**Allowing Non-Alternating Roles**: |
|
Multiple user messages in a row can be provided, and each will be separated with `[INST][/INST]`. This could work well in group conversation |
|
settings, or environments where multiple user messages can be provided before the model is invoked. Having a `[/INST]` breaking each one up |
|
appeared to help prevent the model thinking it needs to respond to every previous message and focus on the last message, while still retaining |
|
knowledge of what messages sit before it. |
|
|
|
**Image Inputs Everywhere**: |
|
Images can now be sent in user, assistant, and tool result messages. And seems to actually work. I did tests like including an image on an |
|
assistant reply 10-15 messages back in the conversation, asked the assistant to recall what image they previously sent, and it was able to |
|
accurately describe it. |
|
|
|
Having this flexibility could allow for interesting applications, for example if you were to define a tool definition for image generation: |
|
- tool is invoked and calls image generation api/model |
|
- image returned inside tool result message |
|
- model responds with a message with context of the image generated |
|
- you can have further conversation about the generated image, or make revisions with the model actually knowing what was created |
|
|
|
## Usage |
|
Working in TabbyAPI with dev branch of ExLlamaV2. |
|
<img src="https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411/resolve/main/image-input-example.jpg"> |
|
|
|
## Available Sizes |
|
| Repo | Bits | Head Bits | Size | |
|
| ----------- | ------ | ------ | ------ | |
|
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-2.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-2.0bpw) | 2.0 | 6.0 | 35.18 GB | |
|
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-2.5bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-2.5bpw) | 2.5 | 6.0 | 39.34 GB | |
|
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-3.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-3.0bpw) | 3.0 | 6.0 | 46.42 GB | |
|
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-3.5bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-3.5bpw) | 3.5 | 6.0 | 53.50 GB | |
|
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-4.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-4.0bpw) | 4.0 | 6.0 | 60.61 GB | |
|
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-4.5bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-4.5bpw) | 4.5 | 6.0 | 67.68 GB | |
|
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-5.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-5.0bpw) | 5.0 | 6.0 | 74.76 GB | |
|
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-6.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-6.0bpw) | 6.0 | 8.0 | 88.81 GB | |
|
| [nintwentydo/Pixtral-Large-Instruct-2411-exl2-8.0bpw](https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411-exl2-8.0bpw) | 8.0 | 8.0 | 97.51 GB | |
|
|