mjbuehler commited on
Commit
e2e3896
·
verified ·
1 Parent(s): 6e29731

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -24
README.md CHANGED
@@ -35,24 +35,21 @@ widget:
35
 
36
  Cephalo is a series of multimodal materials science focused vision large language models (V-LLMs) designed to integrate visual and linguistic data for advanced understanding and interaction in human-AI or multi-agent AI frameworks.
37
 
38
- A novel aspect of Cephalo's development is the innovative dataset generation method. The extraction process employs advanced algorithms to accurately detect and separate images and their corresponding textual descriptions from complex PDF documents. It involves extracting images and captions from PDFs to create well-reasoned image-text pairs, utilizing large language models (LLMs) for natural language processing. These image-text pairs are then refined and validated through LLM-based NLP processing, ensuring high-quality and contextually relevant data for training.
39
-
40
- Cephalo can interpret complex visual scenes and generating contextually accurate language descriptions and answer queries.
41
-
42
  The model is developed to process diverse inputs, including images and text, facilitating a broad range of applications such as image captioning, visual question answering, and multimodal content generation. The architecture combines a vision encoder model and an autoregressive transformer to process complex natural language understanding.
43
 
44
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/kl5GWBP9WS0D4uwd1t3S7.png)
45
 
46
  Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
47
 
48
- This version of Cephalo, lamm-mit/Cephalo-Idefics2-3x8b-beta, is a Mixture-of-Expert model based on the Idefics-2 model. The basic model architecture is as follows:
49
 
50
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/b7BK8ZtDzTMsyFDi0wP3w.png)
51
 
 
52
 
53
  ### Download Idefics-2 MoE Model and Sample inference code
54
 
55
- ```markdown
56
  pip install transformers -U
57
  ```
58
 
@@ -69,9 +66,7 @@ def count_parameters(model):
69
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
70
 
71
  model_name_moe = f"lamm-mit/Cephalo-Idefics2-3x8b-beta"
72
-
73
  config = AutoConfig.from_pretrained(model_name_moe, trust_remote_code=True)
74
-
75
  processor = AutoProcessor.from_pretrained(model_name_moe, trust_remote_code=True)
76
  moe_model = AutoModelForCausalLM.from_pretrained(
77
  model_name_moe,config=config,
@@ -117,7 +112,7 @@ print(generated_texts)
117
 
118
  Download .py files that implement the Phi-3-V and the Mixture-of-Expert Vision model
119
 
120
- ```markdown
121
  pip install huggingface_hub
122
  ```
123
 
@@ -173,7 +168,7 @@ model_1 = Idefics2ForConditionalGeneration.from_pretrained( model_id_1,
173
  _attn_implementation="flash_attention_2", #make sure Flash Attention 2 is installed
174
  trust_remote_code=True,
175
  #quantization_config=quantization_config,
176
- )#.to (DEVICE)
177
  processor = AutoProcessor.from_pretrained(
178
  f"{model_id_1}",
179
  do_image_splitting=True
@@ -186,7 +181,7 @@ processor.chat_template = IDEFICS2_CHAT_TEMPLATE
186
  ```
187
 
188
  Now, load the rest of the models:
189
- ```
190
  model_id_2='HuggingFaceM4/idefics2-8b-chatty'
191
 
192
  model_2 = Idefics2ForConditionalGeneration.from_pretrained( model_id_2,
@@ -194,7 +189,7 @@ model_2 = Idefics2ForConditionalGeneration.from_pretrained( model_id_2,
194
  _attn_implementation="flash_attention_2", #make sure Flash Attention 2 is installed
195
  trust_remote_code=True,
196
  #quantization_config=quantization_config,
197
- )#.to (DEVICE)
198
 
199
  model_id_3='HuggingFaceM4/idefics2-8b'
200
 
@@ -203,38 +198,41 @@ model_3 = Idefics2ForConditionalGeneration.from_pretrained( model_id_3,
203
  _attn_implementation="flash_attention_2", #make sure Flash Attention 2 is installed
204
  trust_remote_code=True,
205
  #quantization_config=quantization_config,
206
- )#.to (DEVICE)
207
  ```
208
  Put on device:
209
- ```
210
  model_1.to(DEVICE)
211
  model_2.to(DEVICE)
212
  model_3.to(DEVICE)
213
  ```
214
 
215
  ### Construct MoE
216
- ```
 
 
 
217
  dtype = torch.bfloat16 # Desired dtype for new layers
218
  base_model = copy.deepcopy(model_1) # Your base model
219
- expert_models = [ model_1, model_2, model_3 ] # List of expert models
220
 
221
  moe_config = Idefics2ForCausalLMMoEConfig(config=config, k=1, num_expert_models=len (expert_models))
222
- moe_model = Idefics2ForCausalLMMoE(moe_config, base_model, expert_models, layer_dtype = dtype)#.to(device)
223
 
224
  count_parameters(expert_models[0]),count_parameters(moe_model)
225
  ```
226
  Delete models no longer needed:
227
- ```
228
  del model_1
229
  del model_2
230
  del model_3
231
  ```
232
  Put MoE model on device:
233
- ```
234
  moe_model.to(DEVICE)
235
  ```
236
- Test if it works (untrained):
237
- ```
238
  from transformers.image_utils import load_image
239
 
240
  image = load_image("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg")
@@ -263,6 +261,9 @@ print(generated_texts)
263
  ```
264
 
265
  ### Now train MoE gating function
 
 
 
266
  ```python
267
  image_1 = Image.open("./VALIDATION/Q15.jpg")
268
  image_1a = Image.open("./VALIDATION/Q31.jpg")
@@ -296,9 +297,10 @@ moe_model.set_gating_layer_params(gating_layer_params)
296
 
297
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/mh4eFDuFsTBOYbjc38PYz.png)
298
 
299
- Inference after MoE gating layers are trained:
300
 
301
- ```
 
 
302
  from transformers.image_utils import load_image
303
 
304
  image = load_image("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg")
@@ -328,6 +330,8 @@ print(generated_texts)
328
 
329
  ### Push to hub and save locally
330
 
 
 
331
  ```python
332
  repo_id='...'
333
  moe_name='Cephalo-Idefics2-3x8b-beta'
@@ -337,8 +341,22 @@ moe_model.push_to_hub (f'{repo_id}/'+merged_name, )
337
  ```
338
 
339
  Save locally:
340
- ```
341
  processor.save_pretrained(moe_name, )
342
  moe_model.save_pretrained(moe_name, )
343
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
344
  ```
 
35
 
36
  Cephalo is a series of multimodal materials science focused vision large language models (V-LLMs) designed to integrate visual and linguistic data for advanced understanding and interaction in human-AI or multi-agent AI frameworks.
37
 
 
 
 
 
38
  The model is developed to process diverse inputs, including images and text, facilitating a broad range of applications such as image captioning, visual question answering, and multimodal content generation. The architecture combines a vision encoder model and an autoregressive transformer to process complex natural language understanding.
39
 
40
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/kl5GWBP9WS0D4uwd1t3S7.png)
41
 
42
  Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
43
 
44
+ This version of Cephalo, lamm-mit/Cephalo-Idefics2-3x8b-beta, is a Mixture-of-Expert model based on variants and fine-tuned versions of the Idefics-2 model. The basic model architecture is as follows:
45
 
46
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/b7BK8ZtDzTMsyFDi0wP3w.png)
47
 
48
+ The model has 20b parameters (3 experts, each 8b each, 8b active parameters during inference).
49
 
50
  ### Download Idefics-2 MoE Model and Sample inference code
51
 
52
+ ```python
53
  pip install transformers -U
54
  ```
55
 
 
66
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
67
 
68
  model_name_moe = f"lamm-mit/Cephalo-Idefics2-3x8b-beta"
 
69
  config = AutoConfig.from_pretrained(model_name_moe, trust_remote_code=True)
 
70
  processor = AutoProcessor.from_pretrained(model_name_moe, trust_remote_code=True)
71
  moe_model = AutoModelForCausalLM.from_pretrained(
72
  model_name_moe,config=config,
 
112
 
113
  Download .py files that implement the Phi-3-V and the Mixture-of-Expert Vision model
114
 
115
+ ```python
116
  pip install huggingface_hub
117
  ```
118
 
 
168
  _attn_implementation="flash_attention_2", #make sure Flash Attention 2 is installed
169
  trust_remote_code=True,
170
  #quantization_config=quantization_config,
171
+ )
172
  processor = AutoProcessor.from_pretrained(
173
  f"{model_id_1}",
174
  do_image_splitting=True
 
181
  ```
182
 
183
  Now, load the rest of the models:
184
+ ```python
185
  model_id_2='HuggingFaceM4/idefics2-8b-chatty'
186
 
187
  model_2 = Idefics2ForConditionalGeneration.from_pretrained( model_id_2,
 
189
  _attn_implementation="flash_attention_2", #make sure Flash Attention 2 is installed
190
  trust_remote_code=True,
191
  #quantization_config=quantization_config,
192
+ )
193
 
194
  model_id_3='HuggingFaceM4/idefics2-8b'
195
 
 
198
  _attn_implementation="flash_attention_2", #make sure Flash Attention 2 is installed
199
  trust_remote_code=True,
200
  #quantization_config=quantization_config,
201
+ )
202
  ```
203
  Put on device:
204
+ ```python
205
  model_1.to(DEVICE)
206
  model_2.to(DEVICE)
207
  model_3.to(DEVICE)
208
  ```
209
 
210
  ### Construct MoE
211
+
212
+ Here we show how a MoE is constructed from the set of expert models loaded earlier. We consider three models, model_1, model_2 and model_3.
213
+
214
+ ```python
215
  dtype = torch.bfloat16 # Desired dtype for new layers
216
  base_model = copy.deepcopy(model_1) # Your base model
217
+ expert_models = [ model_1, model_2, model_3 ] # List of expert models
218
 
219
  moe_config = Idefics2ForCausalLMMoEConfig(config=config, k=1, num_expert_models=len (expert_models))
220
+ moe_model = Idefics2ForCausalLMMoE(moe_config, base_model, expert_models, layer_dtype = dtype)
221
 
222
  count_parameters(expert_models[0]),count_parameters(moe_model)
223
  ```
224
  Delete models no longer needed:
225
+ ```python
226
  del model_1
227
  del model_2
228
  del model_3
229
  ```
230
  Put MoE model on device:
231
+ ```python
232
  moe_model.to(DEVICE)
233
  ```
234
+ Test if it works (untrained, may not produce desirable putput since gating layers have not been trained):
235
+ ```python
236
  from transformers.image_utils import load_image
237
 
238
  image = load_image("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg")
 
261
  ```
262
 
263
  ### Now train MoE gating function
264
+
265
+ We train the gating layers by providing sample images/prompts for each of the three experts. Here is a simple example training set:
266
+
267
  ```python
268
  image_1 = Image.open("./VALIDATION/Q15.jpg")
269
  image_1a = Image.open("./VALIDATION/Q31.jpg")
 
297
 
298
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/mh4eFDuFsTBOYbjc38PYz.png)
299
 
 
300
 
301
+ Now that the MoE model has been trained, we can try inference. Inference after MoE gating layers are trained:
302
+
303
+ ```python
304
  from transformers.image_utils import load_image
305
 
306
  image = load_image("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg")
 
330
 
331
  ### Push to hub and save locally
332
 
333
+ We can save the MoE model either in Hugging Face Hub or locally:
334
+
335
  ```python
336
  repo_id='...'
337
  moe_name='Cephalo-Idefics2-3x8b-beta'
 
341
  ```
342
 
343
  Save locally:
344
+ ```python
345
  processor.save_pretrained(moe_name, )
346
  moe_model.save_pretrained(moe_name, )
347
 
348
+ ```
349
+
350
+ Loading the model works as done above. Here included again for completeness:
351
+ ```python
352
+ model_name_moe = f'{repo_id}/'+moe_name
353
+ config = AutoConfig.from_pretrained(model_name_moe, trust_remote_code=True)
354
+ processor = AutoProcessor.from_pretrained(model_name_moe, trust_remote_code=True)
355
+ moe_model = AutoModelForCausalLM.from_pretrained(
356
+ model_name_moe,config=config,
357
+ trust_remote_code=True, torch_dtype=torch.bfloat16,
358
+ # quantization_config=quantization_config,
359
+ ).to(device)
360
+
361
+ count_parameters(moe_model)
362
  ```