File size: 4,377 Bytes
74b944d
 
4b2d5d2
4c492d6
74b944d
 
7c09547
 
 
83cb829
 
e71c8dc
 
83cb829
 
 
 
 
c361680
83cb829
 
 
 
 
 
 
1db133b
 
 
83cb829
 
 
94bbc1a
 
 
e71c8dc
94bbc1a
 
83cb829
e71c8dc
83cb829
dfd565a
 
e71c8dc
 
 
 
 
 
 
83cb829
bb834c6
 
 
 
 
 
be6ff2e
bb834c6
 
 
 
 
 
 
 
e71c8dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bb834c6
83cb829
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: VLM Demo
sdk: docker
license: 	mit
---

This demo illustrates the work published in the paper ["Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models"](https://arxiv.org/pdf/2402.07865.pdf)


# VLM Demo

> *VLM Demo*: Lightweight repo for chatting with VLMs supported by our 
[VLM Evaluation Suite](https://github.com/TRI-ML/vlm-evaluation/tree/main).

---

## Installation

This repository can be installed as follows:

```bash
git clone git@github.com:TRI-ML/vlm-demo.git
cd vlm-demo
pip install -e .
```

This repository also requires that the `vlm-evaluation` package (`vlm_eval`) is
installed in the current environment. Installation instructions can be found
[here](https://github.com/TRI-ML/vlm-evaluation/tree/main).

## Usage

The main script to run is `interactive_demo.py`, while the implementation of 
the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
(`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
adapted from the [LLaVA Github Repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/).
More details on how this code was modified from the original LLaVA repo is provided in the 
relevant source files.

To run the demo, first run the following commands in separate terminals:

+ Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
+ Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`

To run the interactive demo, you can specify a model to chat with via a `model_dir` or `model_id` as follows

+ `python -m interactive_demo  --port 40000  --model_id <MODEL_ID>` OR
+ `python -m interactive_demo  --port 40000  --model_dir <MODEL_DIR>`

If you want to chat with multiple models simultaneously, you can launch the `interactive_demo` script in different terminals.

When running the demo, the following parameters are adjustable:
+ Temperature
+ Max output tokens

The default interaction mode is Chat, which is the main way to use our models. However, we also support a number of other 
interaction modes for more specific use cases:
+ Captioning: Here,you can simply upload an image with no provided prompt and the selected model will output a caption. Even if a prompt
is input by the user, it will not be used in producing the caption.
+ Bounding Box Prediction: After uploading an image, simply specify a portion of the image for which bounding box coordinates are desired
in the prompt and the selected model will output corresponding coordinates.
+ Visual Question Answering: Selecting this option is best when the user wants short, succint answers to a specific question provided in the
prompt.
+ True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the 
prompt.

## Example

To chat with the LLaVa 1.5 (7B) and Prism 7B models in an interactive GUI, run the following scripts in separate terminals.

Launch gradio controller: 

`python -m serve.controller --host 0.0.0.0 --port 10000`

Launch web server: 

`python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`

Now we can launch an interactive demo corresponding to each of the models we want to chat with. For Prism models, you
onl need to specify a `model_id`, while for LLaVA and InstructBLIP, you need to additionally specifiy a `model_family`
and `model_dir`. Note that for each model, a different port must be specified.

Launch interactive demo for Prism 7B Model: 

`python -m interactive_demo --port 40000 --model_id prism-dinosiglip+7b`

Launch interactive demo for LLaVA 1.5 7B Model: 

`python -m interactive_demo --port 40001 --model_family llava-v15 --model_id llava-v1.5-7b --model_dir liuhaotian/llava-v1.5-7b`

## Contributing

Before committing to the repository, *make sure to set up your dev environment!*

Here are the basic development environment setup guidelines:

+ Fork/clone the repository, performing an editable installation. Make sure to install with the development dependencies
  (e.g., `pip install -e ".[dev]"`); this will install `black`, `ruff`, and `pre-commit`.

+ Install `pre-commit` hooks (`pre-commit install`).

+ Branch for the specific feature/issue, issuing PR against the upstream repository for review.