rthapa84 commited on
Commit
44f54db
โ€ข
1 Parent(s): 7354d05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -3
README.md CHANGED
@@ -1,3 +1,151 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ language:
4
+ - en
5
+ pipeline_tag: image-text-to-text
6
+ tags:
7
+ - text-generation-inference
8
+ ---
9
+
10
+ # Dragonfly-Med Model Card
11
+
12
+ **Note: Users are permitted to use this model in accordance with the Llama 3.1 Community License Agreement. Additionally, due to the licensing restrictions of the dataset used to train this model, which prohibits commercial use, the Dragonfly-Med model is restricted to non-commercial use only.**
13
+
14
+ ## Model Details
15
+
16
+ Dragonfly-Med is a multimodal biomedical visual-language model, trained by instruction tuning on Llama 3.1.
17
+
18
+ - **Developed by:** [Together AI](https://www.together.ai/)
19
+ - **Model type:** An autoregressive visual-language model based on the transformer architecture
20
+ - **License:** [Llama 3.1 Community License Agreement](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE)
21
+ - **Finetuned from model:** [Llama 3.1](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
22
+
23
+ ### Model Sources
24
+
25
+ - **Repository:** https://github.com/togethercomputer/Dragonfly
26
+ - **Paper:** https://arxiv.org/abs/2406.00977
27
+
28
+ ## Uses
29
+
30
+ The primary use of Dragonfly-Med is research on large visual-language models.
31
+ It is primarily intended for researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.
32
+
33
+
34
+ ## How to Get Started with the Model
35
+
36
+ ### ๐Ÿ’ฟ Installation
37
+
38
+ Create a conda environment and install necessary packages
39
+ ```bash
40
+ conda env create -f environment.yml
41
+ conda activate dragonfly_env
42
+ ```
43
+
44
+ Install flash attention
45
+ ```bash
46
+ pip install flash-attn --no-build-isolation
47
+ ```
48
+
49
+ As a final step, please run the following command.
50
+ ```bash
51
+ pip install --upgrade -e .
52
+ ```
53
+
54
+ ### ๐Ÿง  Inference
55
+
56
+ If you have successfully completed the installation process, then you should be able to follow the steps below.
57
+
58
+ Question: Provide a brief description of the given image.
59
+
60
+ ![roco](ROCO_04197.jpg)
61
+
62
+ Load necessary packages
63
+ ```python
64
+ import torch
65
+ from PIL import Image
66
+ from transformers import AutoProcessor, AutoTokenizer
67
+
68
+ from dragonfly.models.modeling_dragonfly import DragonflyForCausalLM
69
+ from dragonfly.models.processing_dragonfly import DragonflyProcessor
70
+ from pipeline.train.train_utils import random_seed
71
+ ```
72
+
73
+ Instantiate the tokenizer, processor, and model.
74
+ ```python
75
+ device = torch.device("cuda:0")
76
+
77
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-3.1-8B-Dragonfly-v1")
78
+ clip_processor = AutoProcessor.from_pretrained("openai/clip-vit-large-patch14-336")
79
+ image_processor = clip_processor.image_processor
80
+ processor = DragonflyProcessor(image_processor=image_processor, tokenizer=tokenizer, image_encoding_style="llava-hd")
81
+
82
+ model = DragonflyForCausalLM.from_pretrained("togethercomputer/Llama-3.1-8B-Dragonfly-v1")
83
+ model = model.to(torch.bfloat16)
84
+ model = model.to(device)
85
+ ```
86
+
87
+ Now, lets load the image and process them.
88
+ ```python
89
+ image = Image.open("ROCO_04197.jpg")
90
+ image = image.convert("RGB")
91
+ images = [image]
92
+ # images = [None] # if you do not want to pass any images
93
+
94
+ text_prompt = "<|start_header_id|>user<|end_header_id|>\n\nProvide a brief description of the given image.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
95
+
96
+ inputs = processor(text=[text_prompt], images=images, max_length=1024, return_tensors="pt", is_generate=True)
97
+ inputs = inputs.to(device)
98
+ ```
99
+
100
+ Finally, let us generate the responses from the model
101
+ ```python
102
+ temperature = 0
103
+
104
+ with torch.inference_mode():
105
+ generation_output = model.generate(**inputs, max_new_tokens=1024, eos_token_id=tokenizer.encode("<|eot_id|>"), do_sample=temperature > 0, temperature=temperature, use_cache=True)
106
+
107
+ generation_text = processor.batch_decode(generation_output, skip_special_tokens=False)
108
+ ```
109
+
110
+ An example response.
111
+ ```plaintext
112
+ Computed tomography scan showing a large heterogenous mass in the pelvis<|eot_id|>
113
+ ```
114
+
115
+ ## Training Details
116
+
117
+ See more details in the "Implementation" section of our [paper](https://arxiv.org/abs/2406.00977).
118
+
119
+ ## Evaluation
120
+
121
+ See more details in the "Results" section of our [paper](https://arxiv.org/abs/2406.00977).
122
+
123
+
124
+ ## ๐Ÿ† Credits
125
+
126
+ We would like to acknowledge the following resources that were instrumental in the development of Dragonfly:
127
+
128
+ - [Meta Llama 3.1](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct): We utilized the Llama 3 model as our foundational language model.
129
+ - [CLIP](https://huggingface.co/openai/clip-vit-base-patch32): Our vision backbone is CLIP model from OpenAI.
130
+ - Our codebase is built upon the following two codebases:
131
+ - [Otter: A Multi-Modal Model with In-Context Instruction Tuning](https://github.com/Luodian/Otter)
132
+ - [LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images](https://github.com/thunlp/LLaVA-UHD)
133
+
134
+ ## ๐Ÿ“š BibTeX
135
+
136
+ ```bibtex
137
+ @misc{chen2024dragonfly,
138
+ title={Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model},
139
+ author={Kezhen Chen and Rahul Thapa and Rahul Chalamala and Ben Athiwaratkun and Shuaiwen Leon Song and James Zou},
140
+ year={2024},
141
+ eprint={2406.00977},
142
+ archivePrefix={arXiv},
143
+ primaryClass={cs.CV}
144
+ }
145
+ ```
146
+
147
+ ## Model Card Authors
148
+ Rahul Thapa, Kezhen Chen, Rahul Chalamala
149
+
150
+ ## Model Card Contact
151
+ Rahul Thapa (rahulthapa@together.ai), Kezhen Chen (kezhen@together.ai)