Spaces:

TRI-ML
/

vlm-demo

Paused

App Files Files Community

vlm-demo / README.md

abalakrishnaTRI

fix README

bb834c6 about 1 year ago

preview code

raw

history blame

2.95 kB

	# VLM Demo

	> VLM Demo: Lightweight repo for chatting with models loaded into VLM Bench.

	---

	## Installation

	This repository can be installed as follows:

	```bash
	git clone git@github.com:TRI-ML/vlm-demo.git
	cd vlm-demo
	pip install -e .
	```

	This repository also requires that the `vlm-bench` package (`vlbench`) and
	`prismatic-vlms` package (`prisma`) are installed in the current environment.
	These can both be installed from source from the following git repos:

	+ `vlm-bench`: `https://github.com/TRI-ML/vlm-bench`
	+ `prismatic-vlms`: `https://github.com/TRI-ML/prismatic-vlms`

	## Usage

	The main script to run is `interactive_demo.py`, while the implementation of
	the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
	(`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
	adapted from the [LLaVA Github Repo:](https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/).
	More details on how this code was modified from the original LLaVA repo is provided in the
	relevant source files.

	To run the demo, run the following commands:

	+ Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
	+ Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
	+ Run interactive demo: `CUDA_VISIBLE_DEVICES=0 python -m interactive_demo --port 40000 --model_dir <PATH TO MODEL CKPT>`

	When running the demo, the following parameters are adjustable:
	+ Temperature
	+ Max output tokens

	The default interaction mode is Chat, which is the main way to use our models. However, we also support a number of other
	interaction modes for more specific use cases:
	+ Captioning: Here, you can simply upload an image with no provided prompt and the selected model will output a caption. Even if a prompt
	is input by the user, it will not be used in producing the caption.
	+ Bounding Box Prediction: After uploading an image, simply specify a portion of the image for which bounding box coordinates are desired
	in the prompt and the selected model will output corresponding coordinates.
	+ Visual Question Answering: Selecting this option is best when the user wants short, succint answers to a specific question provided in the
	prompt.
	+ True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the
	prompt.


	## Contributing

	Before committing to the repository, make sure to set up your dev environment!

	Here are the basic development environment setup guidelines:

	+ Fork/clone the repository, performing an editable installation. Make sure to install with the development dependencies
	(e.g., `pip install -e ".[dev]"`); this will install `black`, `ruff`, and `pre-commit`.

	+ Install `pre-commit` hooks (`pre-commit install`).

	+ Branch for the specific feature/issue, issuing PR against the upstream repository for review.