LeroyDyer
/

Mixtral_AI_llava_4bit

@@ -18,6 +18,184 @@ base_model: LeroyDyer/Mixtral_AI_Vision-Instruct_X
 - **License:** apache-2.0
 - **Finetuned from model :** LeroyDyer/Mixtral_AI_Vision-Instruct_X
 This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - **License:** apache-2.0
 - **Finetuned from model :** LeroyDyer/Mixtral_AI_Vision-Instruct_X
+# Vision/multimodal capabilities:
+ If you want to use vision functionality:
+ * You must use the latest versions of [Koboldcpp](https://github.com/LostRuins/koboldcpp).
+To use the multimodal capabilities of this model and use **vision** you need to load the specified **mmproj** file, this can be found inside this model repo. ([LeroyDyer/Mixtral_AI_Vision-Instruct_X](https://huggingface.co/LeroyDyer/Mixtral_AI_Vision-Instruct_X))
+ * You can load the **mmproj** by using the corresponding section in the interface:
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/UX6Ubss2EPNAT3SKGMLe0.png)
+## Vision/multimodal capabilities:
+* For loading 4-bit use 4-bit mmproj file.- mmproj-Mixtral_AI_Vision-Instruct_X-Q4_0
+* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-Q8_0
+* For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-f16
+## Extended capabilities:
+```
+  * mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base
+  * ChaoticNeutrals/Eris-LelantaclesV2-7b - role play
+  * ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision
+  * rvv-karma/BASH-Coder-Mistral-7B - coding
+  * Locutusque/Hercules-3.1-Mistral-7B - Unhinging
+  * KoboldAI/Mistral-7B-Erebus-v3 - NSFW
+  * Locutusque/Hyperion-2.1-Mistral-7B - CHAT
+  * Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking
+  * NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing
+  * mistralai/Mistral-7B-Instruct-v0.2 - BASE
+  * Nitral-AI/ProdigyXBioMistral_7B - medical
+  * Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement
+  * Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion
+  * yanismiraoui/Yarn-Mistral-7b-128k-sharded
+  * ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay
+```
+# "image-text-text"
+## using transformers
+``` python
+from transformers import AutoProcessor, LlavaForConditionalGeneration
+from transformers import BitsAndBytesConfig
+import torch
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.float16
+)
+model_id = "LeroyDyer/Mixtral_AI_Vision-Instruct_X"
+processor = AutoProcessor.from_pretrained(model_id)
+model = LlavaForConditionalGeneration.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto")
+import requests
+from PIL import Image
+image1 = Image.open(requests.get("https://llava-vl.github.io/static/images/view.jpg", stream=True).raw)
+image2 = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
+display(image1)
+display(image2)
+prompts = [
+            "USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me?\nASSISTANT:",
+            "USER: <image>\nPlease describe this image\nASSISTANT:",
+]
+inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt").to("cuda")
+for k,v in inputs.items():
+  print(k,v.shape)
+```
+## Using pipeline
+``` python
+from transformers import pipeline
+from PIL import Image
+import requests
+model_id = LeroyDyer/Mixtral_AI_Vision-Instruct_X
+pipe = pipeline("image-to-text", model=model_id)
+url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"
+prompt = f"A chat between a curious human and an artificial intelligence assistant.
+            The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:"
+outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
+print(outputs)
+```
+## Mistral ChatTemplating
+Instruction format
+In order to leverage instruction fine-tuning,
+your prompt should be surrounded by [INST] and [/INST] tokens.
+The very first instruction should begin with a begin of sentence id. The next instructions should not.
+The assistant generation will be ended by the end-of-sentence token id.
+```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
+chat = [
+   {"role": "user", "content": "Hello, how are you?"},
+   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
+   {"role": "user", "content": "I'd like to show off how chat templating works!"},
+]
+tokenizer.apply_chat_template(chat, tokenize=False)
+```
+# TextToText
+``` python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+device = "cuda" # the device to load the model onto
+model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
+tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
+messages = [
+    {"role": "user", "content": "What is your favourite condiment?"},
+    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
+    {"role": "user", "content": "Do you have mayonnaise recipes?"}
+]
+encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
+model_inputs = encodeds.to(device)
+model.to(device)
+generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
+decoded = tokenizer.batch_decode(generated_ids)
+print(decoded[0])
+```
 This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)