lamm-mit
/

Cephalo-Idefics-2-vision-10b-beta

@@ -36,11 +36,21 @@ The model is developed to process diverse inputs, including images and text, fac
 Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
-This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-8b-alpha, is based on the HuggingFaceM4/idefics2-8b-chatty model. The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers. For further details on the base model, see: https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
 ### Chat Format
-The lamm-mit/Cephalo-Idefics-2-vision-8b-alpha is suiteable for one or more image inputs, wih prompts using the chat format as follows:
 ```raw
 User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
@@ -76,7 +86,7 @@ DEVICE='cuda:0'
 from transformers import AutoProcessor, Idefics2ForConditionalGeneration
 from tqdm.notebook import tqdm
-model_id='lamm-mit/Cephalo-Idefics-2-vision-8b-alpha'
 model = Idefics2ForConditionalGeneration.from_pretrained(  model_id,
                                                            torch_dtype=torch.bfloat16, #if your GPU allows
@@ -224,7 +234,7 @@ url1 = "https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg"
 response, messages,images= ask_about_image ( model, processor, question,
                                              images_input=[url1,],
                                              temperature=0.1,
-                                             system= '', init_instr='You carefully study the image, and respond accurately, but succinctly. Think step-by-step.\n\n',
                                              show_conversation=True,
                                              max_new_tokens=512, messages=[], images=[])
 ```
@@ -235,7 +245,11 @@ Sample output:
 <small>Image by [Vaishakh Manohar](https://www.quantamagazine.org/the-simple-algorithm-that-ants-use-to-build-bridges-20180226/)</small>
 <pre style="white-space: pre-wrap;">
-The image depicts a group of ants moving in a coordinated manner to climb a vertical surface. This behavior is known as cooperative climbing and involves the use of multiple agents working together to achieve a common goal. The relevance for materials design lies in the potential application of multi-agent AI in developing new materials with improved properties through the collaboration of multiple agents.
 </pre>
 ## Dataset generation
@@ -252,7 +266,7 @@ If your GPU allows, load and run inference in half precision (`torch.float16` or
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
-    "lamm-mit/Cephalo-Idefics-2-vision-8b-alpha",
 +    torch_dtype=torch.float16,
 ).to(DEVICE)
 ```
@@ -273,7 +287,7 @@ Mke sure to install `flash-attn`. Refer to the [original repository of Flash Att
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
-    "lamm-mit/Cephalo-Idefics-2-vision-8b-alpha",
 +    torch_dtype=torch.bfloat16,
 +    _attn_implementation="flash_attention_2",
 ).to(DEVICE)
@@ -296,7 +310,7 @@ quantization_config = BitsAndBytesConfig(
     bnb_4bit_compute_dtype=torch.bfloat16
 )
 model = AutoModelForVision2Seq.from_pretrained(
-    "lamm-mit/Cephalo-Idefics-2-vision-8b-alpha",
 +    torch_dtype=torch.bfloat16,
 +    quantization_config=quantization_config,
 ).to(DEVICE)

 Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
+This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-10b-alpha, is based on a merged expansion of the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta and the HuggingFaceM4/idefics2-8b-chatty model. This method allows us to increase the depth of the model and focus on learning more complex representations and associations in deeper layers of the network.
+The model was trained in several stages:
+**Step 1**: Train https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta by fine-tuning the HuggingFaceM4/idefics2-8b-chatty model.
+**Step 2**: Combine the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta decoder with the last 8 layers of the HuggingFaceM4/idefics2-8b-chatty decoder.
+**Step 3**: Fine-tune the merged model, which now has 40 decoder layers and a total of 10b parameters.
+The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers. For further details on the base model, see: https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
 ### Chat Format
+The lamm-mit/Cephalo-Idefics-2-vision-10b-alpha model is suitable for one or more image inputs, wih prompts using the chat format as follows:
 ```raw
 User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
 from transformers import AutoProcessor, Idefics2ForConditionalGeneration
 from tqdm.notebook import tqdm
+model_id='lamm-mit/Cephalo-Idefics-2-vision-10b-alpha'
 model = Idefics2ForConditionalGeneration.from_pretrained(  model_id,
                                                            torch_dtype=torch.bfloat16, #if your GPU allows
 response, messages,images= ask_about_image ( model, processor, question,
                                              images_input=[url1,],
                                              temperature=0.1,
+                                             system= '', init_instr='You carefully study the image and provide detailed answers. Think step-by-step.\n\n',
                                              show_conversation=True,
                                              max_new_tokens=512, messages=[], images=[])
 ```
 <small>Image by [Vaishakh Manohar](https://www.quantamagazine.org/the-simple-algorithm-that-ants-use-to-build-bridges-20180226/)</small>
 <pre style="white-space: pre-wrap;">
+The image shows a group of ants moving in coordinated patterns on a surface. This illustrates the concept of multi-agent AI, which involves the study and simulation of complex systems involving multiple agents (in this case, ants) interacting with each other and their environment.
+The relevance for materials design is in understanding how these natural systems exhibit emergent behaviors such as self-organization, which can inspire the development of new materials and systems that mimic these natural processes. By studying the movement patterns of ants, researchers can gain insights into how to design materials that exhibit similar emergent properties, leading to improved performance in various applications.
+Multi-agent AI involves creating models that describe the interactions between individual agents and their environment, allowing for the simulation of complex systems with multiple interacting components. This approach can be applied to various fields, including materials science, where understanding emergent behaviors at the microscopic level can lead to the design of new materials with enhanced properties.
 </pre>
 ## Dataset generation
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
+    "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
 +    torch_dtype=torch.float16,
 ).to(DEVICE)
 ```
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
+    "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
 +    torch_dtype=torch.bfloat16,
 +    _attn_implementation="flash_attention_2",
 ).to(DEVICE)
     bnb_4bit_compute_dtype=torch.bfloat16
 )
 model = AutoModelForVision2Seq.from_pretrained(
+    "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
 +    torch_dtype=torch.bfloat16,
 +    quantization_config=quantization_config,
 ).to(DEVICE)