LeroyDyer commited on
Commit
03f84f5
1 Parent(s): f929800

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +178 -0
README.md CHANGED
@@ -18,6 +18,184 @@ base_model: LeroyDyer/Mixtral_AI_Vision-Instruct_X
18
  - **License:** apache-2.0
19
  - **Finetuned from model :** LeroyDyer/Mixtral_AI_Vision-Instruct_X
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
18
  - **License:** apache-2.0
19
  - **Finetuned from model :** LeroyDyer/Mixtral_AI_Vision-Instruct_X
20
 
21
+
22
+ # Vision/multimodal capabilities:
23
+
24
+ If you want to use vision functionality:
25
+
26
+ * You must use the latest versions of [Koboldcpp](https://github.com/LostRuins/koboldcpp).
27
+
28
+ To use the multimodal capabilities of this model and use **vision** you need to load the specified **mmproj** file, this can be found inside this model repo. ([LeroyDyer/Mixtral_AI_Vision-Instruct_X](https://huggingface.co/LeroyDyer/Mixtral_AI_Vision-Instruct_X))
29
+
30
+ * You can load the **mmproj** by using the corresponding section in the interface:
31
+
32
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/UX6Ubss2EPNAT3SKGMLe0.png)
33
+
34
+ ## Vision/multimodal capabilities:
35
+
36
+ * For loading 4-bit use 4-bit mmproj file.- mmproj-Mixtral_AI_Vision-Instruct_X-Q4_0
37
+
38
+ * For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-Q8_0
39
+
40
+ * For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-f16
41
+
42
+
43
+
44
+ ## Extended capabilities:
45
+
46
+ ```
47
+ * mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base
48
+
49
+ * ChaoticNeutrals/Eris-LelantaclesV2-7b - role play
50
+
51
+ * ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision
52
+
53
+ * rvv-karma/BASH-Coder-Mistral-7B - coding
54
+
55
+ * Locutusque/Hercules-3.1-Mistral-7B - Unhinging
56
+
57
+ * KoboldAI/Mistral-7B-Erebus-v3 - NSFW
58
+
59
+ * Locutusque/Hyperion-2.1-Mistral-7B - CHAT
60
+
61
+ * Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking
62
+
63
+ * NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing
64
+
65
+ * mistralai/Mistral-7B-Instruct-v0.2 - BASE
66
+
67
+ * Nitral-AI/ProdigyXBioMistral_7B - medical
68
+
69
+ * Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement
70
+
71
+ * Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion
72
+
73
+ * yanismiraoui/Yarn-Mistral-7b-128k-sharded
74
+
75
+ * ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay
76
+
77
+ ```
78
+
79
+ # "image-text-text"
80
+
81
+
82
+ ## using transformers
83
+
84
+ ``` python
85
+ from transformers import AutoProcessor, LlavaForConditionalGeneration
86
+ from transformers import BitsAndBytesConfig
87
+ import torch
88
+
89
+ quantization_config = BitsAndBytesConfig(
90
+ load_in_4bit=True,
91
+ bnb_4bit_compute_dtype=torch.float16
92
+ )
93
+
94
+
95
+ model_id = "LeroyDyer/Mixtral_AI_Vision-Instruct_X"
96
+
97
+ processor = AutoProcessor.from_pretrained(model_id)
98
+ model = LlavaForConditionalGeneration.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto")
99
+
100
+
101
+ import requests
102
+ from PIL import Image
103
+
104
+ image1 = Image.open(requests.get("https://llava-vl.github.io/static/images/view.jpg", stream=True).raw)
105
+ image2 = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
106
+ display(image1)
107
+ display(image2)
108
+
109
+ prompts = [
110
+ "USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me?\nASSISTANT:",
111
+ "USER: <image>\nPlease describe this image\nASSISTANT:",
112
+ ]
113
+
114
+ inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt").to("cuda")
115
+ for k,v in inputs.items():
116
+ print(k,v.shape)
117
+
118
+ ```
119
+
120
+ ## Using pipeline
121
+
122
+ ``` python
123
+
124
+ from transformers import pipeline
125
+ from PIL import Image
126
+ import requests
127
+
128
+ model_id = LeroyDyer/Mixtral_AI_Vision-Instruct_X
129
+ pipe = pipeline("image-to-text", model=model_id)
130
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
131
+
132
+ image = Image.open(requests.get(url, stream=True).raw)
133
+ question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"
134
+ prompt = f"A chat between a curious human and an artificial intelligence assistant.
135
+ The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:"
136
+
137
+ outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
138
+ print(outputs)
139
+ ```
140
+
141
+
142
+
143
+
144
+
145
+
146
+ ## Mistral ChatTemplating
147
+ Instruction format
148
+ In order to leverage instruction fine-tuning,
149
+ your prompt should be surrounded by [INST] and [/INST] tokens.
150
+ The very first instruction should begin with a begin of sentence id. The next instructions should not.
151
+ The assistant generation will be ended by the end-of-sentence token id.
152
+
153
+
154
+
155
+ ```python
156
+ from transformers import AutoTokenizer
157
+ tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
158
+
159
+ chat = [
160
+ {"role": "user", "content": "Hello, how are you?"},
161
+ {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
162
+ {"role": "user", "content": "I'd like to show off how chat templating works!"},
163
+ ]
164
+
165
+ tokenizer.apply_chat_template(chat, tokenize=False)
166
+
167
+ ```
168
+
169
+ # TextToText
170
+
171
+ ``` python
172
+ from transformers import AutoModelForCausalLM, AutoTokenizer
173
+
174
+ device = "cuda" # the device to load the model onto
175
+
176
+ model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
177
+ tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X")
178
+
179
+ messages = [
180
+ {"role": "user", "content": "What is your favourite condiment?"},
181
+ {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
182
+ {"role": "user", "content": "Do you have mayonnaise recipes?"}
183
+ ]
184
+
185
+ encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
186
+
187
+ model_inputs = encodeds.to(device)
188
+ model.to(device)
189
+
190
+ generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
191
+ decoded = tokenizer.batch_decode(generated_ids)
192
+ print(decoded[0])
193
+ ```
194
+
195
+
196
+
197
+
198
+
199
  This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
200
 
201
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)