Spaces:
Running
on
A10G
Running
on
A10G
Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -10,91 +10,3 @@ pinned: false
|
|
10 |
---
|
11 |
|
12 |
# OneLLM: One Framework to Align All Modalities with Language
|
13 |
-
|
14 |
-
[[Project Page](https://onellm.csuhan.com)] [[Paper](#)] [[Web Demo](https://huggingface.co/spaces/csuhan/OneLLM)]
|
15 |
-
|
16 |
-
Authors: [Jiaming Han](), [Kaixiong Gong](), [Yiyuan Zhang](), [Jiaqi Wang](), [Kaipeng Zhang](), [Dahua Lin](), [Yu Qiao](), [Peng Gao](), [Xiangyu Yue]().
|
17 |
-
|
18 |
-
## News
|
19 |
-
|
20 |
-
- **2023.12.01** Release model weights and inference code.
|
21 |
-
|
22 |
-
## Contents
|
23 |
-
|
24 |
-
- [Install](#install)
|
25 |
-
- [Models](#models)
|
26 |
-
- [Demo](#demo)
|
27 |
-
|
28 |
-
<!-- - [Evaluation](#evaluation) -->
|
29 |
-
|
30 |
-
<!-- - [Training](#training) -->
|
31 |
-
|
32 |
-
### TODO
|
33 |
-
|
34 |
-
- [ ] Data
|
35 |
-
- [ ] Evaluation
|
36 |
-
- [ ] Training
|
37 |
-
|
38 |
-
### Install
|
39 |
-
|
40 |
-
1. Clone the repo into a local folder.
|
41 |
-
|
42 |
-
```bash
|
43 |
-
git clone https://github.com/csuhan/OneLLM
|
44 |
-
|
45 |
-
cd OneLLM
|
46 |
-
```
|
47 |
-
|
48 |
-
2. Install packages.
|
49 |
-
|
50 |
-
```bash
|
51 |
-
conda create -n onellm python=3.9 -y
|
52 |
-
conda activate onellm
|
53 |
-
|
54 |
-
pip install -r requirements.txt
|
55 |
-
|
56 |
-
# install pointnet
|
57 |
-
cd lib/pointnet2
|
58 |
-
python setup.py install
|
59 |
-
```
|
60 |
-
|
61 |
-
3. Install Apex. (Optional)
|
62 |
-
|
63 |
-
```bash
|
64 |
-
git clone https://github.com/NVIDIA/apex
|
65 |
-
cd apex
|
66 |
-
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
|
67 |
-
```
|
68 |
-
|
69 |
-
### Models
|
70 |
-
|
71 |
-
We provide a preview model at: [csuhan/OneLLM-7B](https://huggingface.co/csuhan/OneLLM-7B).
|
72 |
-
|
73 |
-
### Demo
|
74 |
-
|
75 |
-
**Huggingface Demo:** [csuhan/OneLLM](https://huggingface.co/spaces/csuhan/OneLLM).
|
76 |
-
|
77 |
-
**Local Demo:** Assume you have downloaded the weights to ${WEIGHTS_DIR}. Then run the following command to start a gradio demo locally.
|
78 |
-
|
79 |
-
```bash
|
80 |
-
python demos/multi_turn_mm.py --gpu_ids 0 --tokenizer_path config/llama2/tokenizer.model --llama_config config/llama2/7B.json --pretrained_path ${WEIGHTS_DIR}/consolidated.00-of-01.pth
|
81 |
-
```
|
82 |
-
|
83 |
-
<!-- ### Evaluation -->
|
84 |
-
|
85 |
-
<!-- ### Training -->
|
86 |
-
|
87 |
-
## Citation
|
88 |
-
|
89 |
-
```
|
90 |
-
@article{han2023onellm,
|
91 |
-
title={OneLLM: One Framework to Align All Modalities with Language},
|
92 |
-
author={Han, Jiaming and Gong, Kaixiong and Zhang, Yiyuan and Wang, Jiaqi and Zhang, Kaipeng and Lin, Dahua and Qiao, Yu and Gao, Peng and Yue, Xiangyu},
|
93 |
-
journal={arXiv preprint arXiv:xxxx},
|
94 |
-
year={2023}
|
95 |
-
}
|
96 |
-
```
|
97 |
-
|
98 |
-
## Acknowledgement
|
99 |
-
|
100 |
-
[LLaMA](https://github.com/facebookresearch/llama), [LLaMA-Adapter](https://github.com/OpenGVLab/LLaMA-Adapter), [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), [Meta-Transformer](https://github.com/invictus717/MetaTransformer), [ChatBridge](https://github.com/joez17/ChatBridge)
|
|
|
10 |
---
|
11 |
|
12 |
# OneLLM: One Framework to Align All Modalities with Language
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app.py
CHANGED
@@ -183,20 +183,46 @@ def gradio_worker(
|
|
183 |
chatbot = []
|
184 |
msg = ""
|
185 |
return chatbot, msg
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
186 |
|
187 |
CSS ="""
|
188 |
.contain { display: flex; flex-direction: column; }
|
189 |
#component-0 { height: 100%; }
|
190 |
#chatbot { flex-grow: 1; overflow: auto;}
|
191 |
"""
|
192 |
-
|
|
|
193 |
gr.Markdown("## OneLLM: One Framework to Align All Modalities with Language")
|
194 |
with gr.Row(equal_height=True):
|
|
|
|
|
|
|
|
|
|
|
|
|
195 |
with gr.Column(scale=1):
|
196 |
-
|
197 |
-
|
198 |
-
|
199 |
-
|
|
|
|
|
200 |
|
201 |
with gr.Column(scale=2):
|
202 |
chatbot = gr.Chatbot(elem_id="chatbot")
|
@@ -220,6 +246,11 @@ def gradio_worker(
|
|
220 |
minimum=0, maximum=1, value=0.75, interactive=True,
|
221 |
label="Top-p",
|
222 |
)
|
|
|
|
|
|
|
|
|
|
|
223 |
msg.submit(
|
224 |
show_user_input, [msg, chatbot], [msg, chatbot],
|
225 |
).then(
|
|
|
183 |
chatbot = []
|
184 |
msg = ""
|
185 |
return chatbot, msg
|
186 |
+
|
187 |
+
def change_modality(inputs):
|
188 |
+
tab = inputs[0]
|
189 |
+
modality = 'image'
|
190 |
+
label_modal_dict = {
|
191 |
+
'Image': 'image',
|
192 |
+
'Video': 'video',
|
193 |
+
'Audio': 'audio',
|
194 |
+
'Point Cloud': 'point',
|
195 |
+
'IMU': 'imu',
|
196 |
+
'fMRI': 'fmri',
|
197 |
+
'Depth Map': 'rgbd',
|
198 |
+
'Normal Map': 'rgbn'
|
199 |
+
}
|
200 |
+
if tab.label in label_modal_dict:
|
201 |
+
modality = label_modal_dict[tab.label]
|
202 |
+
return modality
|
203 |
|
204 |
CSS ="""
|
205 |
.contain { display: flex; flex-direction: column; }
|
206 |
#component-0 { height: 100%; }
|
207 |
#chatbot { flex-grow: 1; overflow: auto;}
|
208 |
"""
|
209 |
+
|
210 |
+
with gr.Blocks(css=CSS, theme=gr.themes.Soft()) as demo:
|
211 |
gr.Markdown("## OneLLM: One Framework to Align All Modalities with Language")
|
212 |
with gr.Row(equal_height=True):
|
213 |
+
# with gr.Column(scale=1):
|
214 |
+
# img_path = gr.Image(label='Image Input', type='filepath')
|
215 |
+
# video_path = gr.Video(label='Video Input')
|
216 |
+
# audio_path = gr.Audio(label='Audio Input', type='filepath', sources=['upload'])
|
217 |
+
# modality = gr.Radio(choices=['image', 'audio', 'video'], value='image', interactive=True, label='Input Modalities', visible=False)
|
218 |
+
modality = gr.Textbox(value='image', visible=False)
|
219 |
with gr.Column(scale=1):
|
220 |
+
with gr.Tab('Image') as img_tab:
|
221 |
+
img_path = gr.Image(label='Image Input', type='filepath')
|
222 |
+
with gr.Tab('Video') as video_tab:
|
223 |
+
video_path = gr.Video(label='Video Input')
|
224 |
+
with gr.Tab('Audio') as audio_tab:
|
225 |
+
audio_path = gr.Audio(label='Audio Input', type='filepath', sources=['upload'])
|
226 |
|
227 |
with gr.Column(scale=2):
|
228 |
chatbot = gr.Chatbot(elem_id="chatbot")
|
|
|
246 |
minimum=0, maximum=1, value=0.75, interactive=True,
|
247 |
label="Top-p",
|
248 |
)
|
249 |
+
|
250 |
+
img_tab.select(change_modality, [img_tab], [modality])
|
251 |
+
video_tab.select(change_modality, [video_tab], [modality])
|
252 |
+
audio_tab.select(change_modality, [audio_tab], [modality])
|
253 |
+
|
254 |
msg.submit(
|
255 |
show_user_input, [msg, chatbot], [msg, chatbot],
|
256 |
).then(
|