lamm-mit
/

Cephalo-Idefics-2-vision-10b-beta

@@ -36,7 +36,9 @@ The model is developed to process diverse inputs, including images and text, fac
 Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
-This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-10b-alpha, is based on a merged expansion of the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta and the HuggingFaceM4/idefics2-8b-chatty model. This method allows us to increase the depth of the model and focus on learning more complex representations and associations in deeper layers of the network.
 The model was trained in several stages:
@@ -50,7 +52,7 @@ The model was trained on a combination of scientific text-image data extracted f
 ### Chat Format
-The lamm-mit/Cephalo-Idefics-2-vision-10b-alpha model is suitable for one or more image inputs, wih prompts using the chat format as follows:
 ```raw
 User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
@@ -86,7 +88,7 @@ DEVICE='cuda:0'
 from transformers import AutoProcessor, Idefics2ForConditionalGeneration
 from tqdm.notebook import tqdm
-model_id='lamm-mit/Cephalo-Idefics-2-vision-10b-alpha'
 model = Idefics2ForConditionalGeneration.from_pretrained(  model_id,
                                                            torch_dtype=torch.bfloat16, #if your GPU allows
@@ -266,7 +268,7 @@ If your GPU allows, load and run inference in half precision (`torch.float16` or
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
-    "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
 +    torch_dtype=torch.float16,
 ).to(DEVICE)
 ```
@@ -287,7 +289,7 @@ Mke sure to install `flash-attn`. Refer to the [original repository of Flash Att
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
-    "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
 +    torch_dtype=torch.bfloat16,
 +    _attn_implementation="flash_attention_2",
 ).to(DEVICE)
@@ -298,7 +300,7 @@ model = AutoModelForVision2Seq.from_pretrained(
 **4 bit quantization with bitsandbytes**
 <details><summary>Click to expand.</summary>
-It is possible to load Idefics2 in 4bits with `bitsandbytes`. Make sure that you have `accelerate` and `bitsandbytes` installed.
 ```diff
 + from transformers import BitsAndBytesConfig
@@ -310,7 +312,7 @@ quantization_config = BitsAndBytesConfig(
     bnb_4bit_compute_dtype=torch.bfloat16
 )
 model = AutoModelForVision2Seq.from_pretrained(
-    "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
 +    torch_dtype=torch.bfloat16,
 +    quantization_config=quantization_config,
 ).to(DEVICE)

 Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
+This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-10b-beta, is based on a merged expansion of the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta and the HuggingFaceM4/idefics2-8b-chatty model. This method allows us to increase the depth of the model and focus on learning more complex representations and associations in deeper layers of the network.
+The lamm-mit/Cephalo-Idefics-2-vision-10b-beta model is trained for two epochs, while the lamm-mit/Cephalo-Idefics-2-vision-10b-alpha version was trained for one epoch.
 The model was trained in several stages:
 ### Chat Format
+The lamm-mit/Cephalo-Idefics-2-vision-10b-beta model is suitable for one or more image inputs, wih prompts using the chat format as follows:
 ```raw
 User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
 from transformers import AutoProcessor, Idefics2ForConditionalGeneration
 from tqdm.notebook import tqdm
+model_id='lamm-mit/Cephalo-Idefics-2-vision-10b-beta'
 model = Idefics2ForConditionalGeneration.from_pretrained(  model_id,
                                                            torch_dtype=torch.bfloat16, #if your GPU allows
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
+    "lamm-mit/Cephalo-Idefics-2-vision-10b-beta",
 +    torch_dtype=torch.float16,
 ).to(DEVICE)
 ```
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
+    "lamm-mit/Cephalo-Idefics-2-vision-10b-beta",
 +    torch_dtype=torch.bfloat16,
 +    _attn_implementation="flash_attention_2",
 ).to(DEVICE)
 **4 bit quantization with bitsandbytes**
 <details><summary>Click to expand.</summary>
+It is possible to load Cephalo-Idefics-2-vision-10b-beta in 4bits with `bitsandbytes`. Make sure that you have `accelerate` and `bitsandbytes` installed.
 ```diff
 + from transformers import BitsAndBytesConfig
     bnb_4bit_compute_dtype=torch.bfloat16
 )
 model = AutoModelForVision2Seq.from_pretrained(
+    "lamm-mit/Cephalo-Idefics-2-vision-10b-beta",
 +    torch_dtype=torch.bfloat16,
 +    quantization_config=quantization_config,
 ).to(DEVICE)