Upload README.md
Browse files
README.md
CHANGED
@@ -36,11 +36,21 @@ The model is developed to process diverse inputs, including images and text, fac
|
|
36 |
|
37 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
38 |
|
39 |
-
This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
### Chat Format
|
42 |
|
43 |
-
The lamm-mit/Cephalo-Idefics-2-vision-
|
44 |
|
45 |
```raw
|
46 |
User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
|
@@ -76,7 +86,7 @@ DEVICE='cuda:0'
|
|
76 |
from transformers import AutoProcessor, Idefics2ForConditionalGeneration
|
77 |
from tqdm.notebook import tqdm
|
78 |
|
79 |
-
model_id='lamm-mit/Cephalo-Idefics-2-vision-
|
80 |
|
81 |
model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
|
82 |
torch_dtype=torch.bfloat16, #if your GPU allows
|
@@ -224,7 +234,7 @@ url1 = "https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg"
|
|
224 |
response, messages,images= ask_about_image ( model, processor, question,
|
225 |
images_input=[url1,],
|
226 |
temperature=0.1,
|
227 |
-
system= '', init_instr='You carefully study the image
|
228 |
show_conversation=True,
|
229 |
max_new_tokens=512, messages=[], images=[])
|
230 |
```
|
@@ -235,7 +245,11 @@ Sample output:
|
|
235 |
<small>Image by [Vaishakh Manohar](https://www.quantamagazine.org/the-simple-algorithm-that-ants-use-to-build-bridges-20180226/)</small>
|
236 |
|
237 |
<pre style="white-space: pre-wrap;">
|
238 |
-
The image
|
|
|
|
|
|
|
|
|
239 |
</pre>
|
240 |
|
241 |
## Dataset generation
|
@@ -252,7 +266,7 @@ If your GPU allows, load and run inference in half precision (`torch.float16` or
|
|
252 |
|
253 |
```diff
|
254 |
model = AutoModelForVision2Seq.from_pretrained(
|
255 |
-
"lamm-mit/Cephalo-Idefics-2-vision-8b-
|
256 |
+ torch_dtype=torch.float16,
|
257 |
).to(DEVICE)
|
258 |
```
|
@@ -273,7 +287,7 @@ Mke sure to install `flash-attn`. Refer to the [original repository of Flash Att
|
|
273 |
|
274 |
```diff
|
275 |
model = AutoModelForVision2Seq.from_pretrained(
|
276 |
-
"lamm-mit/Cephalo-Idefics-2-vision-8b-
|
277 |
+ torch_dtype=torch.bfloat16,
|
278 |
+ _attn_implementation="flash_attention_2",
|
279 |
).to(DEVICE)
|
@@ -296,7 +310,7 @@ quantization_config = BitsAndBytesConfig(
|
|
296 |
bnb_4bit_compute_dtype=torch.bfloat16
|
297 |
)
|
298 |
model = AutoModelForVision2Seq.from_pretrained(
|
299 |
-
"lamm-mit/Cephalo-Idefics-2-vision-8b-
|
300 |
+ torch_dtype=torch.bfloat16,
|
301 |
+ quantization_config=quantization_config,
|
302 |
).to(DEVICE)
|
|
|
36 |
|
37 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
38 |
|
39 |
+
This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-10b-alpha, is based on a merged expansion of the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta and the HuggingFaceM4/idefics2-8b-chatty model. This method allows us to increase the depth of the model and focus on learning more complex representations and associations in deeper layers of the network.
|
40 |
+
|
41 |
+
The model was trained in several stages:
|
42 |
+
|
43 |
+
**Step 1**: Train https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta by fine-tuning the HuggingFaceM4/idefics2-8b-chatty model.
|
44 |
+
|
45 |
+
**Step 2**: Combine the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta decoder with the last 8 layers of the HuggingFaceM4/idefics2-8b-chatty decoder.
|
46 |
+
|
47 |
+
**Step 3**: Fine-tune the merged model, which now has 40 decoder layers and a total of 10b parameters.
|
48 |
+
|
49 |
+
The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers. For further details on the base model, see: https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
|
50 |
|
51 |
### Chat Format
|
52 |
|
53 |
+
The lamm-mit/Cephalo-Idefics-2-vision-10b-alpha model is suitable for one or more image inputs, wih prompts using the chat format as follows:
|
54 |
|
55 |
```raw
|
56 |
User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
|
|
|
86 |
from transformers import AutoProcessor, Idefics2ForConditionalGeneration
|
87 |
from tqdm.notebook import tqdm
|
88 |
|
89 |
+
model_id='lamm-mit/Cephalo-Idefics-2-vision-10b-alpha'
|
90 |
|
91 |
model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
|
92 |
torch_dtype=torch.bfloat16, #if your GPU allows
|
|
|
234 |
response, messages,images= ask_about_image ( model, processor, question,
|
235 |
images_input=[url1,],
|
236 |
temperature=0.1,
|
237 |
+
system= '', init_instr='You carefully study the image and provide detailed answers. Think step-by-step.\n\n',
|
238 |
show_conversation=True,
|
239 |
max_new_tokens=512, messages=[], images=[])
|
240 |
```
|
|
|
245 |
<small>Image by [Vaishakh Manohar](https://www.quantamagazine.org/the-simple-algorithm-that-ants-use-to-build-bridges-20180226/)</small>
|
246 |
|
247 |
<pre style="white-space: pre-wrap;">
|
248 |
+
The image shows a group of ants moving in coordinated patterns on a surface. This illustrates the concept of multi-agent AI, which involves the study and simulation of complex systems involving multiple agents (in this case, ants) interacting with each other and their environment.
|
249 |
+
|
250 |
+
The relevance for materials design is in understanding how these natural systems exhibit emergent behaviors such as self-organization, which can inspire the development of new materials and systems that mimic these natural processes. By studying the movement patterns of ants, researchers can gain insights into how to design materials that exhibit similar emergent properties, leading to improved performance in various applications.
|
251 |
+
|
252 |
+
Multi-agent AI involves creating models that describe the interactions between individual agents and their environment, allowing for the simulation of complex systems with multiple interacting components. This approach can be applied to various fields, including materials science, where understanding emergent behaviors at the microscopic level can lead to the design of new materials with enhanced properties.
|
253 |
</pre>
|
254 |
|
255 |
## Dataset generation
|
|
|
266 |
|
267 |
```diff
|
268 |
model = AutoModelForVision2Seq.from_pretrained(
|
269 |
+
"lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
|
270 |
+ torch_dtype=torch.float16,
|
271 |
).to(DEVICE)
|
272 |
```
|
|
|
287 |
|
288 |
```diff
|
289 |
model = AutoModelForVision2Seq.from_pretrained(
|
290 |
+
"lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
|
291 |
+ torch_dtype=torch.bfloat16,
|
292 |
+ _attn_implementation="flash_attention_2",
|
293 |
).to(DEVICE)
|
|
|
310 |
bnb_4bit_compute_dtype=torch.bfloat16
|
311 |
)
|
312 |
model = AutoModelForVision2Seq.from_pretrained(
|
313 |
+
"lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
|
314 |
+ torch_dtype=torch.bfloat16,
|
315 |
+ quantization_config=quantization_config,
|
316 |
).to(DEVICE)
|