Update README.md
Browse files
README.md
CHANGED
@@ -36,17 +36,18 @@ The model is developed to process diverse inputs, including images and text, fac
|
|
36 |
|
37 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
38 |
|
39 |
-
This version of Cephalo, lamm-mit/Cephalo-Phi-3-vision-128k-4b, is based on the Phi-3-Vision-128K-Instruct model. The model has a context length of 128,000 tokens. Further details, see: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct.
|
40 |
|
41 |
### Chat Format
|
42 |
|
43 |
-
Given the nature of the training data, the Cephalo-Phi-3-vision-128k-4b model is best suited for a single image input wih prompts using the chat format as follows.
|
|
|
44 |
You can provide the prompt as a single image with a generic template as follow:
|
45 |
```markdown
|
46 |
<|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n
|
47 |
```
|
48 |
|
49 |
-
|
50 |
|
51 |
```markdown
|
52 |
<|user|>\n<|image_1|>\n{prompt_1}<|end|>\n<|assistant|>\n{response_1}<|end|>\n<|user|>\n{prompt_2}<|end|>\n<|assistant|>\n
|
@@ -62,7 +63,7 @@ import requests
|
|
62 |
from transformers import AutoModelForCausalLM
|
63 |
from transformers import AutoProcessor
|
64 |
|
65 |
-
model_id = "lamm-mit/Cephalo-Phi-3-vision-128k-4b"
|
66 |
|
67 |
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto")
|
68 |
|
|
|
36 |
|
37 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
38 |
|
39 |
+
This version of Cephalo, lamm-mit/Cephalo-Phi-3-vision-128k-4b-alpha, is based on the Phi-3-Vision-128K-Instruct model. The model has a context length of 128,000 tokens. Further details, see: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct.
|
40 |
|
41 |
### Chat Format
|
42 |
|
43 |
+
Given the nature of the training data, the Cephalo-Phi-3-vision-128k-4b-alpha model is best suited for a single image input wih prompts using the chat format as follows.
|
44 |
+
|
45 |
You can provide the prompt as a single image with a generic template as follow:
|
46 |
```markdown
|
47 |
<|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n
|
48 |
```
|
49 |
|
50 |
+
The model generates the text after `<|assistant|>` . For multi-turn conversations, the prompt should be formatted as follows:
|
51 |
|
52 |
```markdown
|
53 |
<|user|>\n<|image_1|>\n{prompt_1}<|end|>\n<|assistant|>\n{response_1}<|end|>\n<|user|>\n{prompt_2}<|end|>\n<|assistant|>\n
|
|
|
63 |
from transformers import AutoModelForCausalLM
|
64 |
from transformers import AutoProcessor
|
65 |
|
66 |
+
model_id = "lamm-mit/Cephalo-Phi-3-vision-128k-4b-alpha"
|
67 |
|
68 |
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto")
|
69 |
|