mattb512 commited on
Commit
d080507
1 Parent(s): cc7a5fe

readme cleanup

Browse files
Files changed (1) hide show
  1. README.md +3 -90
README.md CHANGED
@@ -7,97 +7,10 @@ license: mit
7
  This demo illustrates the work published in the paper ["Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models"](https://arxiv.org/pdf/2402.07865.pdf)
8
 
9
 
10
- # VLM Demo
 
 
11
 
12
  > *VLM Demo*: Lightweight repo for chatting with VLMs supported by our
13
  [VLM Evaluation Suite](https://github.com/TRI-ML/vlm-evaluation/tree/main).
14
 
15
- ---
16
-
17
- ## Installation
18
-
19
- This repository can be installed as follows:
20
-
21
- ```bash
22
- git clone git@github.com:TRI-ML/vlm-demo.git
23
- cd vlm-demo
24
- pip install -e .
25
- ```
26
-
27
- This repository also requires that the `vlm-evaluation` package (`vlm_eval`) is
28
- installed in the current environment. Installation instructions can be found
29
- [here](https://github.com/TRI-ML/vlm-evaluation/tree/main).
30
-
31
- ## Usage
32
-
33
- The main script to run is `interactive_demo.py`, while the implementation of
34
- the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
35
- (`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
36
- adapted from the [LLaVA Github Repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/).
37
- More details on how this code was modified from the original LLaVA repo is provided in the
38
- relevant source files.
39
-
40
- To run the demo, first run the following commands in separate terminals:
41
-
42
- + Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
43
- + Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
44
-
45
- To run the interactive demo, you can specify a model to chat with via a `model_dir` or `model_id` as follows
46
-
47
- + `python -m interactive_demo --port 40000 --model_id <MODEL_ID>` OR
48
- + `python -m interactive_demo --port 40000 --model_dir <MODEL_DIR>`
49
-
50
- If you want to chat with multiple models simultaneously, you can launch the `interactive_demo` script in different terminals.
51
-
52
- When running the demo, the following parameters are adjustable:
53
- + Temperature
54
- + Max output tokens
55
-
56
- The default interaction mode is Chat, which is the main way to use our models. However, we also support a number of other
57
- interaction modes for more specific use cases:
58
- + Captioning: Here,you can simply upload an image with no provided prompt and the selected model will output a caption. Even if a prompt
59
- is input by the user, it will not be used in producing the caption.
60
- + Bounding Box Prediction: After uploading an image, simply specify a portion of the image for which bounding box coordinates are desired
61
- in the prompt and the selected model will output corresponding coordinates.
62
- + Visual Question Answering: Selecting this option is best when the user wants short, succint answers to a specific question provided in the
63
- prompt.
64
- + True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the
65
- prompt.
66
-
67
- ## Example
68
-
69
- To chat with the LLaVa 1.5 (7B) and Prism 7B models in an interactive GUI, run the following scripts in separate terminals.
70
-
71
- Launch gradio controller:
72
-
73
- `python -m serve.controller --host 0.0.0.0 --port 10000`
74
-
75
- Launch web server:
76
-
77
- `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
78
-
79
- Now we can launch an interactive demo corresponding to each of the models we want to chat with. For Prism models, you
80
- onl need to specify a `model_id`, while for LLaVA and InstructBLIP, you need to additionally specifiy a `model_family`
81
- and `model_dir`. Note that for each model, a different port must be specified.
82
-
83
- Launch interactive demo for Prism 7B Model:
84
-
85
- `python -m interactive_demo --port 40000 --model_id prism-dinosiglip+7b`
86
-
87
- Launch interactive demo for LLaVA 1.5 7B Model:
88
-
89
- `python -m interactive_demo --port 40001 --model_family llava-v15 --model_id llava-v1.5-7b --model_dir liuhaotian/llava-v1.5-7b`
90
-
91
- ## Contributing
92
-
93
- Before committing to the repository, *make sure to set up your dev environment!*
94
-
95
- Here are the basic development environment setup guidelines:
96
-
97
- + Fork/clone the repository, performing an editable installation. Make sure to install with the development dependencies
98
- (e.g., `pip install -e ".[dev]"`); this will install `black`, `ruff`, and `pre-commit`.
99
-
100
- + Install `pre-commit` hooks (`pre-commit install`).
101
-
102
- + Branch for the specific feature/issue, issuing PR against the upstream repository for review.
103
-
 
7
  This demo illustrates the work published in the paper ["Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models"](https://arxiv.org/pdf/2402.07865.pdf)
8
 
9
 
10
+ # Source code
11
+
12
+ For more information, please refer to this repository:
13
 
14
  > *VLM Demo*: Lightweight repo for chatting with VLMs supported by our
15
  [VLM Evaluation Suite](https://github.com/TRI-ML/vlm-evaluation/tree/main).
16