|
## Experiments |
|
|
|
Please check `amlt_configs/` for the experiments configs. |
|
|
|
## Performance |
|
|
|
The major results can be found in [docs/MODEL_ZOO.md](./MODEL_ZOO.md) and our [Project Page](https://xk-huang.github.io/segment-caption-anything). |
|
|
|
We also provide evaluation code of our baseline ([Promptable-GRiT](https://github.com/xk-huang/Promptable-GRiT)) and [benchmark referring VLLMs](https://github.com/xk-huang/benchmark-referring-vllm). |
|
|
|
## Evaluate with `vdtk` |
|
|
|
### Install `vdtk` |
|
|
|
Support CLIP computation with images encoded by base64. |
|
|
|
https://github.com/xk-huang/vdtk/tree/dev |
|
|
|
- data (e.g., jar files): https://huggingface.co/xk-huang/vdtk-data |
|
|
|
Install with external data: |
|
|
|
#### Docker |
|
|
|
```shell |
|
alias=`whoami | cut -d'.' -f2` |
|
docker run -itd --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} -w `pwd` --name sca nvcr.io/nvidia/pytorch:22.10-py3 bash |
|
docker exec -it sca bash |
|
|
|
# In the docker container |
|
# cd to the code dir |
|
. amlt_configs/setup.sh |
|
source ~/.bashrc |
|
pip install pydantic==1.10.8 # https://github.com/pydantic/pydantic/issues/545#issuecomment-1573776471 |
|
. amlt_configs/setup_eval_suite.sh |
|
``` |
|
|
|
#### Conda |
|
|
|
```shell |
|
# Install env first |
|
# conda create -n sca -y python=3.9 |
|
# conda activate sca |
|
# conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia |
|
|
|
ORIGINAL_DIR="$(pwd)" |
|
REPO_DIR=/tmp/vdtk |
|
git clone --recursive https://github.com/xk-huang/vdtk.git $REPO_DIR -b dev |
|
cd $REPO_DIR |
|
git submodule update --init --recursive |
|
|
|
apt-get update |
|
sudo apt-get update |
|
apt-get install git-lfs |
|
sudo apt-get install git-lfs |
|
|
|
git lfs install |
|
git clone https://huggingface.co/xk-huang/vdtk-data |
|
# git submodule init && git submodule update |
|
|
|
rsync -avP ./vdtk-data/vdtk . |
|
rm -rf vdtk-data |
|
|
|
pip install --upgrade pip |
|
pip install -e . POT==0.9.0 # POT=0.9.1 will take up all the memory with tf backend |
|
pip install tensorflow==2.12.1 # Just fix one version of tf |
|
pip install levenshtein==0.21.1 |
|
pip install openpyxl==3.1.2 |
|
|
|
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" |
|
cd "$ORIGINAL_DIR" |
|
``` |
|
|
|
Potential Problems: |
|
|
|
- About Tensorflow: TF does not support CUDA 12 now (08/15/23). So we use `nvcr.io/nvidia/pytorch:22.12-py3` which contains CUDA 11.8. |
|
- Encoding in docker image: `import locale;locale.getpreferredencoding()` is `ANSI_X3.4-1968` rather than `UTF-8` which causes error in file writing. |
|
- change `vdtk/metrics/tokenizer/ptbtokenizer.py:73`: `tmp_file = tempfile.NamedTemporaryFile(mode="w", delete=False, encoding="utf-8")` |
|
|
|
|
|
### The format of input prediction json file |
|
|
|
```json |
|
[ |
|
{ |
|
"_id": 0, |
|
"split": "inference", |
|
"references": [ |
|
"a man wearing a red and white shirt" |
|
], |
|
"candidates": [ |
|
"red and yellow", |
|
"red shirt guy", |
|
"red and yellow uniform" |
|
], |
|
"metadata": { |
|
"metadata_input_boxes": [ |
|
0, |
|
95, |
|
113, |
|
419 |
|
], |
|
"metadata_image_id": 266240, |
|
"metadata_region_id": 27287 |
|
}, |
|
"logits": { |
|
"iou_scores": [ |
|
0.89990234375, |
|
0.994140625, |
|
0.99365234375 |
|
] |
|
} |
|
} |
|
] |
|
``` |
|
|
|
### The structure of files |
|
|
|
``` |
|
$OUTPUT_DIR/infer/infer-visual_genome-densecap-local-densecap-test.json |
|
# infer-{data_script_identifier}-{name}-{split}.json |
|
``` |
|
|
|
### All-in-one script |
|
|
|
Usage: |
|
|
|
```shell |
|
>>> bash scripts/tools/eval_suite.sh |
|
# Env args: |
|
# DRY_RUN: |
|
# ONLY_GATHER: |
|
# ONLY_EVAL: |
|
# SKIP_CLIP_RECALL: |
|
# DEBUG: |
|
# NO_POST_PROCESS: |
|
# Usage: [DRY_RUN=1] [ONLY_GATHER=1] [ONLY_EVAL=1] ./eval_suite.sh <INFERENCE_JSON_DIR> <JSON_FILE_NAME> <SPLIT> [<IMAGE_B64_TSV_PATH>] [<MERGE_TSV_INTO_JSON_FOR_VDTK_SCRIPT>] [<POST_PROCESS_MULTI_CANDIDATES_SCRIPT>]JSON_FILE_NAME is not used, use any string like 'xxx' for it. |
|
``` |
|
|
|
e.g., |
|
|
|
```bash |
|
DRY_RUN=1 NO_POST_PROCESS=1 ONLY_EVAL=1 SKIP_CLIP_RECALL=1 bash scripts/tools/eval_suite.sh exp/ xxx inference |
|
``` |
|
|
|
<details> |
|
<summary>The details about the script.</summary> |
|
|
|
1. Replace GT captions (the tokenizer processed ones) with the real GT (`scripts/tools/replace_references_in_json_for_vdtk.py`). Please prepare the folder structure correctly as in [this](The structure of files). It requires the `.hydra` config. |
|
2. Remove multiple predictions but keep one based on IOU score (`scripts/tools/post_process_multi_candidates_for_vdtk.py`). |
|
|
|
If there are multiple candidate preditions, we only choose **one candidates** with highest IOU for Meteor, CIDEr-D, ROUGE, etc.: |
|
|
|
```shell |
|
python scripts/tools/post_process_multi_candidates_for_vdtk.py -i $INFERENCE_JSON_PATH |
|
``` |
|
|
|
Process multiple inference json file under a certain dirctory: |
|
|
|
```shell |
|
INFERENCE_JSON_DIR= |
|
find $INFERENCE_JSON_DIR -name 'infer.json' -exec python scripts/tools/post_process_multi_candidates_for_vdtk.py -i {} \; |
|
``` |
|
|
|
3. evaluate with vdtk, and save the results in `.log` file |
|
|
|
You need to change `PRED_JSONS_BASE_DIR`, `JSON_FILE_NAME`, `SPLIT`, and `IMAGE_B64_TSV_PATH`. |
|
|
|
If the `infer.json` file is too large to open in vscode, you can use vim to open it and change the above variables accordingly. |
|
|
|
Currently, `JSON_FILE_NAME` is deprecated as we `find` the `*.json` in `PRED_JSONS_BASE_DIR`. |
|
|
|
4. Parse the results for each `*.log` and gather to one xlsx by sheets. |
|
|
|
Parse the log. Change the `PRED_JSONS_BASE_DIR` accordingly. |
|
|
|
5. Merge each metric into one table with `scripts/tools/merge_sheets_xlsx.py` |
|
|
|
</details> |