mjbuehler commited on
Commit
ea84862
1 Parent(s): 074a971

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -6,6 +6,12 @@ tags:
6
  - nlp
7
  - code
8
  - vision
 
 
 
 
 
 
9
  pipeline_tag: text-generation
10
  inference:
11
  parameters:
@@ -29,11 +35,11 @@ The model is developed to process diverse inputs, including images and text, fac
29
 
30
  Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
31
 
32
- This version of Cephalo, lamm-mit/Cephalo-Phi-3-vision-128k, is based on the Phi-3-Vision-128K-Instruct model. Further details, see: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct.
33
 
34
  ### Chat Format
35
 
36
- Given the nature of the training data, the Cephalo-Phi-3-vision-128k model is best suited for a single image input wih prompts using the chat format as follows.
37
  You can provide the prompt as a single image with a generic template as follow:
38
  ```markdown
39
  <|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n
@@ -55,7 +61,7 @@ import requests
55
  from transformers import AutoModelForCausalLM
56
  from transformers import AutoProcessor
57
 
58
- model_id = "lamm-mit/Cephalo-Phi-3-vision-128k"
59
 
60
  model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto")
61
 
@@ -99,8 +105,7 @@ The image shows a group of red imported fire ants (Solenopsis invicta) forming a
99
 
100
  The schematic below shows a visualization of the approach to generate datasets for training the vision model. The extraction process employs advanced algorithms to accurately detect and separate images and their corresponding textual descriptions from complex PDF documents. It involves extracting images and captions from PDFs to create well-reasoned image-text pairs, utilizing large language models (LLMs) for natural language processing. These image-text pairs are then refined and validated through LLM-based NLP processing, ensuring high-quality and contextually relevant data for training.
101
 
102
-
103
- Reproductions of two representative pages of the scientific article (here, Spivak, Buehler, et al., 2011).
104
 
105
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/qHURSBRWEDgHy4o56escN.png)
106
 
@@ -118,4 +123,4 @@ Please cite as:
118
  pages = {},
119
  url = {}
120
  }
121
- ```
 
6
  - nlp
7
  - code
8
  - vision
9
+ - chemistry
10
+ - engineering
11
+ - biology
12
+ - bio-inspired
13
+ - text-generation-inference
14
+ - materials science
15
  pipeline_tag: text-generation
16
  inference:
17
  parameters:
 
35
 
36
  Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
37
 
38
+ This version of Cephalo, lamm-mit/Cephalo-Phi-3-vision-128k-4b, is based on the Phi-3-Vision-128K-Instruct model. Further details, see: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct.
39
 
40
  ### Chat Format
41
 
42
+ Given the nature of the training data, the Cephalo-Phi-3-vision-128k-4b model is best suited for a single image input wih prompts using the chat format as follows.
43
  You can provide the prompt as a single image with a generic template as follow:
44
  ```markdown
45
  <|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n
 
61
  from transformers import AutoModelForCausalLM
62
  from transformers import AutoProcessor
63
 
64
+ model_id = "lamm-mit/Cephalo-Phi-3-vision-128k-4b"
65
 
66
  model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto")
67
 
 
105
 
106
  The schematic below shows a visualization of the approach to generate datasets for training the vision model. The extraction process employs advanced algorithms to accurately detect and separate images and their corresponding textual descriptions from complex PDF documents. It involves extracting images and captions from PDFs to create well-reasoned image-text pairs, utilizing large language models (LLMs) for natural language processing. These image-text pairs are then refined and validated through LLM-based NLP processing, ensuring high-quality and contextually relevant data for training.
107
 
108
+ The image below shows reproductions of two representative pages of the scientific article (here, Spivak, Buehler, et al., 2011), and how they are used to extract visual scientific data for training the Cephalo model.
 
109
 
110
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/qHURSBRWEDgHy4o56escN.png)
111
 
 
123
  pages = {},
124
  url = {}
125
  }
126
+ ```