LeroyDyer's picture
Update README.md
03f84f5 verified
---
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- mistral
- trl
- sft
base_model: LeroyDyer/Mixtral_AI_Vision-Instruct_X
---
# Uploaded model
- **Developed by:** LeroyDyer
- **License:** apache-2.0
- **Finetuned from model :** LeroyDyer/Mixtral_AI_Vision-Instruct_X
# Vision/multimodal capabilities:
If you want to use vision functionality:
* You must use the latest versions of [Koboldcpp](https://github.com/LostRuins/koboldcpp).
To use the multimodal capabilities of this model and use **vision** you need to load the specified **mmproj** file, this can be found inside this model repo. ([LeroyDyer/Mixtral_AI_Vision-Instruct_X](https://huggingface.co/LeroyDyer/Mixtral_AI_Vision-Instruct_X))
* You can load the **mmproj** by using the corresponding section in the interface:
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/UX6Ubss2EPNAT3SKGMLe0.png)
## Vision/multimodal capabilities:
* For loading 4-bit use 4-bit mmproj file.- mmproj-Mixtral_AI_Vision-Instruct_X-Q4_0
* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-Q8_0
* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-f16
## Extended capabilities:
```
* mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base
* ChaoticNeutrals/Eris-LelantaclesV2-7b - role play
* ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision
* rvv-karma/BASH-Coder-Mistral-7B - coding
* Locutusque/Hercules-3.1-Mistral-7B - Unhinging
* KoboldAI/Mistral-7B-Erebus-v3 - NSFW
* Locutusque/Hyperion-2.1-Mistral-7B - CHAT
* Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking
* NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing
* mistralai/Mistral-7B-Instruct-v0.2 - BASE
* Nitral-AI/ProdigyXBioMistral_7B - medical
* Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement
* Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion
* yanismiraoui/Yarn-Mistral-7b-128k-sharded
* ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay
```
# "image-text-text"
## using transformers
``` python
from transformers import AutoProcessor, LlavaForConditionalGeneration
from transformers import BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)
model_id = "LeroyDyer/Mixtral_AI_Vision-Instruct_X"
processor = AutoProcessor.from_pretrained(model_id)
model = LlavaForConditionalGeneration.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto")
import requests
from PIL import Image
image1 = Image.open(requests.get("https://llava-vl.github.io/static/images/view.jpg", stream=True).raw)
image2 = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
display(image1)
display(image2)
prompts = [
"USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me?\nASSISTANT:",
"USER: <image>\nPlease describe this image\nASSISTANT:",
]
inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt").to("cuda")
for k,v in inputs.items():
print(k,v.shape)
```
## Using pipeline
``` python
from transformers import pipeline
from PIL import Image
import requests
model_id = LeroyDyer/Mixtral_AI_Vision-Instruct_X
pipe = pipeline("image-to-text", model=model_id)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
image = Image.open(requests.get(url, stream=True).raw)
question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"
prompt = f"A chat between a curious human and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:"
outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
print(outputs)
```
## Mistral ChatTemplating
Instruction format
In order to leverage instruction fine-tuning,
your prompt should be surrounded by [INST] and [/INST] tokens.
The very first instruction should begin with a begin of sentence id. The next instructions should not.
The assistant generation will be ended by the end-of-sentence token id.
```python
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
chat = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how chat templating works!"},
]
tokenizer.apply_chat_template(chat, tokenize=False)
```
# TextToText
``` python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
```
This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)