mjbuehler commited on
Commit
7e3c46a
1 Parent(s): d7c7e03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -36,7 +36,9 @@ The model is developed to process diverse inputs, including images and text, fac
36
 
37
  Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
38
 
39
- This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-10b-alpha, is based on a merged expansion of the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta and the HuggingFaceM4/idefics2-8b-chatty model. This method allows us to increase the depth of the model and focus on learning more complex representations and associations in deeper layers of the network.
 
 
40
 
41
  The model was trained in several stages:
42
 
@@ -50,7 +52,7 @@ The model was trained on a combination of scientific text-image data extracted f
50
 
51
  ### Chat Format
52
 
53
- The lamm-mit/Cephalo-Idefics-2-vision-10b-alpha model is suitable for one or more image inputs, wih prompts using the chat format as follows:
54
 
55
  ```raw
56
  User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
@@ -86,7 +88,7 @@ DEVICE='cuda:0'
86
  from transformers import AutoProcessor, Idefics2ForConditionalGeneration
87
  from tqdm.notebook import tqdm
88
 
89
- model_id='lamm-mit/Cephalo-Idefics-2-vision-10b-alpha'
90
 
91
  model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
92
  torch_dtype=torch.bfloat16, #if your GPU allows
@@ -266,7 +268,7 @@ If your GPU allows, load and run inference in half precision (`torch.float16` or
266
 
267
  ```diff
268
  model = AutoModelForVision2Seq.from_pretrained(
269
- "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
270
  + torch_dtype=torch.float16,
271
  ).to(DEVICE)
272
  ```
@@ -287,7 +289,7 @@ Mke sure to install `flash-attn`. Refer to the [original repository of Flash Att
287
 
288
  ```diff
289
  model = AutoModelForVision2Seq.from_pretrained(
290
- "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
291
  + torch_dtype=torch.bfloat16,
292
  + _attn_implementation="flash_attention_2",
293
  ).to(DEVICE)
@@ -298,7 +300,7 @@ model = AutoModelForVision2Seq.from_pretrained(
298
  **4 bit quantization with bitsandbytes**
299
 
300
  <details><summary>Click to expand.</summary>
301
- It is possible to load Idefics2 in 4bits with `bitsandbytes`. Make sure that you have `accelerate` and `bitsandbytes` installed.
302
 
303
  ```diff
304
  + from transformers import BitsAndBytesConfig
@@ -310,7 +312,7 @@ quantization_config = BitsAndBytesConfig(
310
  bnb_4bit_compute_dtype=torch.bfloat16
311
  )
312
  model = AutoModelForVision2Seq.from_pretrained(
313
- "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
314
  + torch_dtype=torch.bfloat16,
315
  + quantization_config=quantization_config,
316
  ).to(DEVICE)
 
36
 
37
  Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
38
 
39
+ This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-10b-beta, is based on a merged expansion of the https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta and the HuggingFaceM4/idefics2-8b-chatty model. This method allows us to increase the depth of the model and focus on learning more complex representations and associations in deeper layers of the network.
40
+
41
+ The lamm-mit/Cephalo-Idefics-2-vision-10b-beta model is trained for two epochs, while the lamm-mit/Cephalo-Idefics-2-vision-10b-alpha version was trained for one epoch.
42
 
43
  The model was trained in several stages:
44
 
 
52
 
53
  ### Chat Format
54
 
55
+ The lamm-mit/Cephalo-Idefics-2-vision-10b-beta model is suitable for one or more image inputs, wih prompts using the chat format as follows:
56
 
57
  ```raw
58
  User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
 
88
  from transformers import AutoProcessor, Idefics2ForConditionalGeneration
89
  from tqdm.notebook import tqdm
90
 
91
+ model_id='lamm-mit/Cephalo-Idefics-2-vision-10b-beta'
92
 
93
  model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
94
  torch_dtype=torch.bfloat16, #if your GPU allows
 
268
 
269
  ```diff
270
  model = AutoModelForVision2Seq.from_pretrained(
271
+ "lamm-mit/Cephalo-Idefics-2-vision-10b-beta",
272
  + torch_dtype=torch.float16,
273
  ).to(DEVICE)
274
  ```
 
289
 
290
  ```diff
291
  model = AutoModelForVision2Seq.from_pretrained(
292
+ "lamm-mit/Cephalo-Idefics-2-vision-10b-beta",
293
  + torch_dtype=torch.bfloat16,
294
  + _attn_implementation="flash_attention_2",
295
  ).to(DEVICE)
 
300
  **4 bit quantization with bitsandbytes**
301
 
302
  <details><summary>Click to expand.</summary>
303
+ It is possible to load Cephalo-Idefics-2-vision-10b-beta in 4bits with `bitsandbytes`. Make sure that you have `accelerate` and `bitsandbytes` installed.
304
 
305
  ```diff
306
  + from transformers import BitsAndBytesConfig
 
312
  bnb_4bit_compute_dtype=torch.bfloat16
313
  )
314
  model = AutoModelForVision2Seq.from_pretrained(
315
+ "lamm-mit/Cephalo-Idefics-2-vision-10b-beta",
316
  + torch_dtype=torch.bfloat16,
317
  + quantization_config=quantization_config,
318
  ).to(DEVICE)