Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,12 @@ tags:
|
|
6 |
- nlp
|
7 |
- code
|
8 |
- vision
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
pipeline_tag: text-generation
|
10 |
inference:
|
11 |
parameters:
|
@@ -29,11 +35,11 @@ The model is developed to process diverse inputs, including images and text, fac
|
|
29 |
|
30 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
31 |
|
32 |
-
This version of Cephalo, lamm-mit/Cephalo-Phi-3-vision-128k, is based on the Phi-3-Vision-128K-Instruct model. Further details, see: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct.
|
33 |
|
34 |
### Chat Format
|
35 |
|
36 |
-
Given the nature of the training data, the Cephalo-Phi-3-vision-128k model is best suited for a single image input wih prompts using the chat format as follows.
|
37 |
You can provide the prompt as a single image with a generic template as follow:
|
38 |
```markdown
|
39 |
<|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n
|
@@ -55,7 +61,7 @@ import requests
|
|
55 |
from transformers import AutoModelForCausalLM
|
56 |
from transformers import AutoProcessor
|
57 |
|
58 |
-
model_id = "lamm-mit/Cephalo-Phi-3-vision-128k"
|
59 |
|
60 |
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto")
|
61 |
|
@@ -99,8 +105,7 @@ The image shows a group of red imported fire ants (Solenopsis invicta) forming a
|
|
99 |
|
100 |
The schematic below shows a visualization of the approach to generate datasets for training the vision model. The extraction process employs advanced algorithms to accurately detect and separate images and their corresponding textual descriptions from complex PDF documents. It involves extracting images and captions from PDFs to create well-reasoned image-text pairs, utilizing large language models (LLMs) for natural language processing. These image-text pairs are then refined and validated through LLM-based NLP processing, ensuring high-quality and contextually relevant data for training.
|
101 |
|
102 |
-
|
103 |
-
Reproductions of two representative pages of the scientific article (here, Spivak, Buehler, et al., 2011).
|
104 |
|
105 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/qHURSBRWEDgHy4o56escN.png)
|
106 |
|
@@ -118,4 +123,4 @@ Please cite as:
|
|
118 |
pages = {},
|
119 |
url = {}
|
120 |
}
|
121 |
-
```
|
|
|
6 |
- nlp
|
7 |
- code
|
8 |
- vision
|
9 |
+
- chemistry
|
10 |
+
- engineering
|
11 |
+
- biology
|
12 |
+
- bio-inspired
|
13 |
+
- text-generation-inference
|
14 |
+
- materials science
|
15 |
pipeline_tag: text-generation
|
16 |
inference:
|
17 |
parameters:
|
|
|
35 |
|
36 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
37 |
|
38 |
+
This version of Cephalo, lamm-mit/Cephalo-Phi-3-vision-128k-4b, is based on the Phi-3-Vision-128K-Instruct model. Further details, see: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct.
|
39 |
|
40 |
### Chat Format
|
41 |
|
42 |
+
Given the nature of the training data, the Cephalo-Phi-3-vision-128k-4b model is best suited for a single image input wih prompts using the chat format as follows.
|
43 |
You can provide the prompt as a single image with a generic template as follow:
|
44 |
```markdown
|
45 |
<|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n
|
|
|
61 |
from transformers import AutoModelForCausalLM
|
62 |
from transformers import AutoProcessor
|
63 |
|
64 |
+
model_id = "lamm-mit/Cephalo-Phi-3-vision-128k-4b"
|
65 |
|
66 |
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto")
|
67 |
|
|
|
105 |
|
106 |
The schematic below shows a visualization of the approach to generate datasets for training the vision model. The extraction process employs advanced algorithms to accurately detect and separate images and their corresponding textual descriptions from complex PDF documents. It involves extracting images and captions from PDFs to create well-reasoned image-text pairs, utilizing large language models (LLMs) for natural language processing. These image-text pairs are then refined and validated through LLM-based NLP processing, ensuring high-quality and contextually relevant data for training.
|
107 |
|
108 |
+
The image below shows reproductions of two representative pages of the scientific article (here, Spivak, Buehler, et al., 2011), and how they are used to extract visual scientific data for training the Cephalo model.
|
|
|
109 |
|
110 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/qHURSBRWEDgHy4o56escN.png)
|
111 |
|
|
|
123 |
pages = {},
|
124 |
url = {}
|
125 |
}
|
126 |
+
```
|