Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,9 @@ The model is developed to process diverse inputs, including images and text, fac
|
|
36 |
|
37 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
38 |
|
39 |
-
This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-10b-
|
|
|
|
|
40 |
|
41 |
The model was trained in several stages:
|
42 |
|
@@ -50,7 +52,7 @@ The model was trained on a combination of scientific text-image data extracted f
|
|
50 |
|
51 |
### Chat Format
|
52 |
|
53 |
-
The lamm-mit/Cephalo-Idefics-2-vision-10b-
|
54 |
|
55 |
```raw
|
56 |
User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
|
@@ -86,7 +88,7 @@ DEVICE='cuda:0'
|
|
86 |
from transformers import AutoProcessor, Idefics2ForConditionalGeneration
|
87 |
from tqdm.notebook import tqdm
|
88 |
|
89 |
-
model_id='lamm-mit/Cephalo-Idefics-2-vision-10b-
|
90 |
|
91 |
model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
|
92 |
torch_dtype=torch.bfloat16, #if your GPU allows
|
@@ -266,7 +268,7 @@ If your GPU allows, load and run inference in half precision (`torch.float16` or
|
|
266 |
|
267 |
```diff
|
268 |
model = AutoModelForVision2Seq.from_pretrained(
|
269 |
-
"lamm-mit/Cephalo-Idefics-2-vision-
|
270 |
+ torch_dtype=torch.float16,
|
271 |
).to(DEVICE)
|
272 |
```
|
@@ -287,7 +289,7 @@ Mke sure to install `flash-attn`. Refer to the [original repository of Flash Att
|
|
287 |
|
288 |
```diff
|
289 |
model = AutoModelForVision2Seq.from_pretrained(
|
290 |
-
"lamm-mit/Cephalo-Idefics-2-vision-
|
291 |
+ torch_dtype=torch.bfloat16,
|
292 |
+ _attn_implementation="flash_attention_2",
|
293 |
).to(DEVICE)
|
@@ -298,7 +300,7 @@ model = AutoModelForVision2Seq.from_pretrained(
|
|
298 |
**4 bit quantization with bitsandbytes**
|
299 |
|
300 |
<details><summary>Click to expand.</summary>
|
301 |
-
It is possible to load
|
302 |
|
303 |
```diff
|
304 |
+ from transformers import BitsAndBytesConfig
|
@@ -310,7 +312,7 @@ quantization_config = BitsAndBytesConfig(
|
|
310 |
bnb_4bit_compute_dtype=torch.bfloat16
|
311 |
)
|
312 |
model = AutoModelForVision2Seq.from_pretrained(
|
313 |
-
"lamm-mit/Cephalo-Idefics-2-vision-
|
314 |
+ torch_dtype=torch.bfloat16,
|
315 |
+ quantization_config=quantization_config,
|
316 |
).to(DEVICE)
|
|
|
36 |
|
37 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
38 |
|
39 |
+
This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-10b-beta, is based on a merged expansion of the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta and the HuggingFaceM4/idefics2-8b-chatty model. This method allows us to increase the depth of the model and focus on learning more complex representations and associations in deeper layers of the network.
|
40 |
+
|
41 |
+
The lamm-mit/Cephalo-Idefics-2-vision-10b-beta model is trained for two epochs, while the lamm-mit/Cephalo-Idefics-2-vision-10b-alpha version was trained for one epoch.
|
42 |
|
43 |
The model was trained in several stages:
|
44 |
|
|
|
52 |
|
53 |
### Chat Format
|
54 |
|
55 |
+
The lamm-mit/Cephalo-Idefics-2-vision-10b-beta model is suitable for one or more image inputs, wih prompts using the chat format as follows:
|
56 |
|
57 |
```raw
|
58 |
User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
|
|
|
88 |
from transformers import AutoProcessor, Idefics2ForConditionalGeneration
|
89 |
from tqdm.notebook import tqdm
|
90 |
|
91 |
+
model_id='lamm-mit/Cephalo-Idefics-2-vision-10b-beta'
|
92 |
|
93 |
model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
|
94 |
torch_dtype=torch.bfloat16, #if your GPU allows
|
|
|
268 |
|
269 |
```diff
|
270 |
model = AutoModelForVision2Seq.from_pretrained(
|
271 |
+
"lamm-mit/Cephalo-Idefics-2-vision-10b-beta",
|
272 |
+ torch_dtype=torch.float16,
|
273 |
).to(DEVICE)
|
274 |
```
|
|
|
289 |
|
290 |
```diff
|
291 |
model = AutoModelForVision2Seq.from_pretrained(
|
292 |
+
"lamm-mit/Cephalo-Idefics-2-vision-10b-beta",
|
293 |
+ torch_dtype=torch.bfloat16,
|
294 |
+ _attn_implementation="flash_attention_2",
|
295 |
).to(DEVICE)
|
|
|
300 |
**4 bit quantization with bitsandbytes**
|
301 |
|
302 |
<details><summary>Click to expand.</summary>
|
303 |
+
It is possible to load Cephalo-Idefics-2-vision-10b-beta in 4bits with `bitsandbytes`. Make sure that you have `accelerate` and `bitsandbytes` installed.
|
304 |
|
305 |
```diff
|
306 |
+ from transformers import BitsAndBytesConfig
|
|
|
312 |
bnb_4bit_compute_dtype=torch.bfloat16
|
313 |
)
|
314 |
model = AutoModelForVision2Seq.from_pretrained(
|
315 |
+
"lamm-mit/Cephalo-Idefics-2-vision-10b-beta",
|
316 |
+ torch_dtype=torch.bfloat16,
|
317 |
+ quantization_config=quantization_config,
|
318 |
).to(DEVICE)
|