mjbuehler commited on
Commit
d7c7e03
1 Parent(s): 1e185c2

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -8
README.md CHANGED
@@ -36,11 +36,21 @@ The model is developed to process diverse inputs, including images and text, fac
36
 
37
  Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
38
 
39
- This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-8b-alpha, is based on the HuggingFaceM4/idefics2-8b-chatty model. The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers. For further details on the base model, see: https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
 
 
 
 
 
 
 
 
 
 
40
 
41
  ### Chat Format
42
 
43
- The lamm-mit/Cephalo-Idefics-2-vision-8b-alpha is suiteable for one or more image inputs, wih prompts using the chat format as follows:
44
 
45
  ```raw
46
  User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
@@ -76,7 +86,7 @@ DEVICE='cuda:0'
76
  from transformers import AutoProcessor, Idefics2ForConditionalGeneration
77
  from tqdm.notebook import tqdm
78
 
79
- model_id='lamm-mit/Cephalo-Idefics-2-vision-8b-alpha'
80
 
81
  model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
82
  torch_dtype=torch.bfloat16, #if your GPU allows
@@ -224,7 +234,7 @@ url1 = "https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg"
224
  response, messages,images= ask_about_image ( model, processor, question,
225
  images_input=[url1,],
226
  temperature=0.1,
227
- system= '', init_instr='You carefully study the image, and respond accurately, but succinctly. Think step-by-step.\n\n',
228
  show_conversation=True,
229
  max_new_tokens=512, messages=[], images=[])
230
  ```
@@ -235,7 +245,11 @@ Sample output:
235
  <small>Image by [Vaishakh Manohar](https://www.quantamagazine.org/the-simple-algorithm-that-ants-use-to-build-bridges-20180226/)</small>
236
 
237
  <pre style="white-space: pre-wrap;">
238
- The image depicts a group of ants moving in a coordinated manner to climb a vertical surface. This behavior is known as cooperative climbing and involves the use of multiple agents working together to achieve a common goal. The relevance for materials design lies in the potential application of multi-agent AI in developing new materials with improved properties through the collaboration of multiple agents.
 
 
 
 
239
  </pre>
240
 
241
  ## Dataset generation
@@ -252,7 +266,7 @@ If your GPU allows, load and run inference in half precision (`torch.float16` or
252
 
253
  ```diff
254
  model = AutoModelForVision2Seq.from_pretrained(
255
- "lamm-mit/Cephalo-Idefics-2-vision-8b-alpha",
256
  + torch_dtype=torch.float16,
257
  ).to(DEVICE)
258
  ```
@@ -273,7 +287,7 @@ Mke sure to install `flash-attn`. Refer to the [original repository of Flash Att
273
 
274
  ```diff
275
  model = AutoModelForVision2Seq.from_pretrained(
276
- "lamm-mit/Cephalo-Idefics-2-vision-8b-alpha",
277
  + torch_dtype=torch.bfloat16,
278
  + _attn_implementation="flash_attention_2",
279
  ).to(DEVICE)
@@ -296,7 +310,7 @@ quantization_config = BitsAndBytesConfig(
296
  bnb_4bit_compute_dtype=torch.bfloat16
297
  )
298
  model = AutoModelForVision2Seq.from_pretrained(
299
- "lamm-mit/Cephalo-Idefics-2-vision-8b-alpha",
300
  + torch_dtype=torch.bfloat16,
301
  + quantization_config=quantization_config,
302
  ).to(DEVICE)
 
36
 
37
  Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
38
 
39
+ This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-10b-alpha, is based on a merged expansion of the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta and the HuggingFaceM4/idefics2-8b-chatty model. This method allows us to increase the depth of the model and focus on learning more complex representations and associations in deeper layers of the network.
40
+
41
+ The model was trained in several stages:
42
+
43
+ **Step 1**: Train https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta by fine-tuning the HuggingFaceM4/idefics2-8b-chatty model.
44
+
45
+ **Step 2**: Combine the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta decoder with the last 8 layers of the HuggingFaceM4/idefics2-8b-chatty decoder.
46
+
47
+ **Step 3**: Fine-tune the merged model, which now has 40 decoder layers and a total of 10b parameters.
48
+
49
+ The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers. For further details on the base model, see: https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
50
 
51
  ### Chat Format
52
 
53
+ The lamm-mit/Cephalo-Idefics-2-vision-10b-alpha model is suitable for one or more image inputs, wih prompts using the chat format as follows:
54
 
55
  ```raw
56
  User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
 
86
  from transformers import AutoProcessor, Idefics2ForConditionalGeneration
87
  from tqdm.notebook import tqdm
88
 
89
+ model_id='lamm-mit/Cephalo-Idefics-2-vision-10b-alpha'
90
 
91
  model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
92
  torch_dtype=torch.bfloat16, #if your GPU allows
 
234
  response, messages,images= ask_about_image ( model, processor, question,
235
  images_input=[url1,],
236
  temperature=0.1,
237
+ system= '', init_instr='You carefully study the image and provide detailed answers. Think step-by-step.\n\n',
238
  show_conversation=True,
239
  max_new_tokens=512, messages=[], images=[])
240
  ```
 
245
  <small>Image by [Vaishakh Manohar](https://www.quantamagazine.org/the-simple-algorithm-that-ants-use-to-build-bridges-20180226/)</small>
246
 
247
  <pre style="white-space: pre-wrap;">
248
+ The image shows a group of ants moving in coordinated patterns on a surface. This illustrates the concept of multi-agent AI, which involves the study and simulation of complex systems involving multiple agents (in this case, ants) interacting with each other and their environment.
249
+
250
+ The relevance for materials design is in understanding how these natural systems exhibit emergent behaviors such as self-organization, which can inspire the development of new materials and systems that mimic these natural processes. By studying the movement patterns of ants, researchers can gain insights into how to design materials that exhibit similar emergent properties, leading to improved performance in various applications.
251
+
252
+ Multi-agent AI involves creating models that describe the interactions between individual agents and their environment, allowing for the simulation of complex systems with multiple interacting components. This approach can be applied to various fields, including materials science, where understanding emergent behaviors at the microscopic level can lead to the design of new materials with enhanced properties.
253
  </pre>
254
 
255
  ## Dataset generation
 
266
 
267
  ```diff
268
  model = AutoModelForVision2Seq.from_pretrained(
269
+ "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
270
  + torch_dtype=torch.float16,
271
  ).to(DEVICE)
272
  ```
 
287
 
288
  ```diff
289
  model = AutoModelForVision2Seq.from_pretrained(
290
+ "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
291
  + torch_dtype=torch.bfloat16,
292
  + _attn_implementation="flash_attention_2",
293
  ).to(DEVICE)
 
310
  bnb_4bit_compute_dtype=torch.bfloat16
311
  )
312
  model = AutoModelForVision2Seq.from_pretrained(
313
+ "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
314
  + torch_dtype=torch.bfloat16,
315
  + quantization_config=quantization_config,
316
  ).to(DEVICE)