abalakrishnaTRI
commited on
Commit
β’
e71c8dc
1
Parent(s):
c10578e
cleanup
Browse files- .gitignore +2 -0
- README.md +37 -10
- interactive_demo.py +3 -3
- serve/gradio_web_server.py +5 -6
.gitignore
CHANGED
@@ -103,6 +103,8 @@ celerybeat.pid
|
|
103 |
|
104 |
# Logs
|
105 |
serve_images/
|
|
|
|
|
106 |
|
107 |
# Environments
|
108 |
.env
|
|
|
103 |
|
104 |
# Logs
|
105 |
serve_images/
|
106 |
+
*conv.json
|
107 |
+
*controller.log*
|
108 |
|
109 |
# Environments
|
110 |
.env
|
README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
# VLM Demo
|
2 |
|
3 |
-
> *VLM Demo*: Lightweight repo for chatting with
|
|
|
4 |
|
5 |
---
|
6 |
|
@@ -14,27 +15,30 @@ cd vlm-demo
|
|
14 |
pip install -e .
|
15 |
```
|
16 |
|
17 |
-
This repository also requires that the `vlm-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
+ `vlm-bench`: `https://github.com/TRI-ML/vlm-bench`
|
22 |
-
+ `prismatic-vlms`: `https://github.com/TRI-ML/prismatic-vlms`
|
23 |
|
24 |
## Usage
|
25 |
|
26 |
The main script to run is `interactive_demo.py`, while the implementation of
|
27 |
the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
|
28 |
(`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
|
29 |
-
adapted from the [LLaVA Github Repo
|
30 |
More details on how this code was modified from the original LLaVA repo is provided in the
|
31 |
relevant source files.
|
32 |
|
33 |
-
To run the demo, run the following commands:
|
34 |
|
35 |
+ Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
|
36 |
+ Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
|
37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
When running the demo, the following parameters are adjustable:
|
40 |
+ Temperature
|
@@ -51,6 +55,29 @@ prompt.
|
|
51 |
+ True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the
|
52 |
prompt.
|
53 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
## Contributing
|
56 |
|
|
|
1 |
# VLM Demo
|
2 |
|
3 |
+
> *VLM Demo*: Lightweight repo for chatting with VLMs supported by our
|
4 |
+
[VLM Evaluation Suite](https://github.com/TRI-ML/vlm-evaluation/tree/main).
|
5 |
|
6 |
---
|
7 |
|
|
|
15 |
pip install -e .
|
16 |
```
|
17 |
|
18 |
+
This repository also requires that the `vlm-evaluation` package (`vlm_eval`) is
|
19 |
+
installed in the current environment. Installation instructions can be found
|
20 |
+
[here](https://github.com/TRI-ML/vlm-evaluation/tree/main).
|
|
|
|
|
|
|
21 |
|
22 |
## Usage
|
23 |
|
24 |
The main script to run is `interactive_demo.py`, while the implementation of
|
25 |
the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
|
26 |
(`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
|
27 |
+
adapted from the [LLaVA Github Repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/).
|
28 |
More details on how this code was modified from the original LLaVA repo is provided in the
|
29 |
relevant source files.
|
30 |
|
31 |
+
To run the demo, first run the following commands in separate terminals:
|
32 |
|
33 |
+ Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
|
34 |
+ Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
|
35 |
+
|
36 |
+
To run the interactive demo, you can specify a model to chat with via a `model_dir` or `model_id` as follows
|
37 |
+
|
38 |
+
+ `python -m interactive_demo --port 40000 --model_id <MODEL_ID>` OR
|
39 |
+
+ `python -m interactive_demo --port 40000 --model_dir <MODEL_DIR>`
|
40 |
+
|
41 |
+
If you want to chat with multiple models simultaneously, you can launch the `interactive_demo` script in different terminals.
|
42 |
|
43 |
When running the demo, the following parameters are adjustable:
|
44 |
+ Temperature
|
|
|
55 |
+ True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the
|
56 |
prompt.
|
57 |
|
58 |
+
## Example
|
59 |
+
|
60 |
+
To chat with the LLaVa 1.5 (7B) and Prism 7B models in an interactive GUI, run the following scripts in separate terminals.
|
61 |
+
|
62 |
+
Launch gradio controller:
|
63 |
+
|
64 |
+
`python -m serve.controller --host 0.0.0.0 --port 10000`
|
65 |
+
|
66 |
+
Launch web server:
|
67 |
+
|
68 |
+
`python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
|
69 |
+
|
70 |
+
Now we can launch an interactive demo corresponding to each of the models we want to chat with. For Prism models, you
|
71 |
+
onl need to specify a `model_id`, while for LLaVA and InstructBLIP, you need to additionally specifiy a `model_family`
|
72 |
+
and `model_dir`. Note that for each model, a different port must be specified.
|
73 |
+
|
74 |
+
Launch interactive demo for Prism 7B Model:
|
75 |
+
|
76 |
+
`python -m interactive_demo --port 40000 --model_id prism-dinosiglip+7b`
|
77 |
+
|
78 |
+
Launch interactive demo for LLaVA 1.5 7B Model:
|
79 |
+
|
80 |
+
`python -m interactive_demo --port 40001 --model_family llava-v15 --model_id llava-v1.5-7b --model_dir liuhaotian/llava-v1.5-7b`
|
81 |
|
82 |
## Contributing
|
83 |
|
interactive_demo.py
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
"""
|
2 |
interactive_demo.py
|
3 |
|
4 |
-
Entry point for all VLM-
|
5 |
|
6 |
This file is heavily adapted from the script used to serve models in the LLaVa repo:
|
7 |
https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/model_worker.py. It is
|
@@ -30,8 +30,8 @@ from llava.mm_utils import load_image_from_base64
|
|
30 |
from llava.utils import server_error_msg
|
31 |
from torchvision.transforms import Compose
|
32 |
|
33 |
-
from
|
34 |
-
from
|
35 |
from serve import INTERACTION_MODES_MAP, MODEL_ID_TO_NAME
|
36 |
|
37 |
GB = 1 << 30
|
|
|
1 |
"""
|
2 |
interactive_demo.py
|
3 |
|
4 |
+
Entry point for all VLM-Evaluation interactive demos; specify model and get a gradio UI where you can chat with it!
|
5 |
|
6 |
This file is heavily adapted from the script used to serve models in the LLaVa repo:
|
7 |
https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/model_worker.py. It is
|
|
|
30 |
from llava.utils import server_error_msg
|
31 |
from torchvision.transforms import Compose
|
32 |
|
33 |
+
from vlm_eval.models import load_vlm
|
34 |
+
from vlm_eval.overwatch import initialize_overwatch
|
35 |
from serve import INTERACTION_MODES_MAP, MODEL_ID_TO_NAME
|
36 |
|
37 |
GB = 1 << 30
|
serve/gradio_web_server.py
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
"""
|
2 |
gradio_web_server.py
|
3 |
|
4 |
-
Entry point for all VLM-
|
5 |
|
6 |
This file is copied from the script used to define the gradio web server in the LLaVa codebase:
|
7 |
https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/gradio_web_server.py with only very minor
|
@@ -244,9 +244,9 @@ def http_bot(state, model_selector, interaction_mode, temperature, max_new_token
|
|
244 |
|
245 |
title_markdown = """
|
246 |
# Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
|
247 |
-
[[
|
248 |
-
[[
|
249 |
-
| π [[Paper](
|
250 |
"""
|
251 |
|
252 |
tos_markdown = """
|
@@ -254,8 +254,7 @@ tos_markdown = """
|
|
254 |
By using this service, users are required to agree to the following terms:
|
255 |
The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may
|
256 |
generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The
|
257 |
-
service may collect user dialogue data for future research.
|
258 |
-
inappropriate answer! We will collect those to keep improving our moderator. For an optimal experience,
|
259 |
please use desktop computers for this demo, as mobile devices may compromise its quality. This website
|
260 |
is heavily inspired by the website released by [LLaVA](https://github.com/haotian-liu/LLaVA).
|
261 |
"""
|
|
|
1 |
"""
|
2 |
gradio_web_server.py
|
3 |
|
4 |
+
Entry point for all VLM-Evaluation interactive demos; specify model and get a gradio UI where you can chat with it!
|
5 |
|
6 |
This file is copied from the script used to define the gradio web server in the LLaVa codebase:
|
7 |
https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/gradio_web_server.py with only very minor
|
|
|
244 |
|
245 |
title_markdown = """
|
246 |
# Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
|
247 |
+
[[[Training Code](github.com/TRI-ML/prismatic-vlms)]
|
248 |
+
[[[Evaluation Code](github.com/TRI-ML/vlm-evaluation)]
|
249 |
+
| π [[Paper](https://arxiv.org/abs/2402.07865)]
|
250 |
"""
|
251 |
|
252 |
tos_markdown = """
|
|
|
254 |
By using this service, users are required to agree to the following terms:
|
255 |
The service is a research preview intended for non-commercial use only. It only provides limited safety measures and may
|
256 |
generate offensive content. It must not be used for any illegal, harmful, violent, racist, or sexual purposes. The
|
257 |
+
service may collect user dialogue data for future research. For an optimal experience,
|
|
|
258 |
please use desktop computers for this demo, as mobile devices may compromise its quality. This website
|
259 |
is heavily inspired by the website released by [LLaVA](https://github.com/haotian-liu/LLaVA).
|
260 |
"""
|