ScienceQA
Prepare Data
- Please see ScienceQA repo for setting up the dataset.
- Generate ScienceQA dataset for LLaVA conversation-style format.
python scripts/convert_sqa_to_llava.py \
    convert_to_llava \
    --base-dir /path/to/ScienceQA/data/scienceqa \
    --prompt-format "QCM-LEA" \
    --split {train,val,minival,test,minitest}
Training
- Pretraining
You can download our pretrained projector weights from our Model Zoo, or train your own projector weights using pretrain.sh.
- Finetuning
See finetune_sqa.sh.
Evaluation
- Multiple-GPU inference You may evaluate this with multiple GPUs, and concatenate the generated jsonl files. Please refer to our script for batch evaluation and results gathering. 
- Single-GPU inference 
(a) Generate LLaVA responses on ScienceQA dataset
python -m llava.eval.model_vqa_science \
    --model-path liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3 \
    --question-file /path/to/ScienceQA/data/scienceqa/llava_test_QCM-LEA.json \
    --image-folder /path/to/ScienceQA/data/scienceqa/images/test \
    --answers-file vqa/results/ScienceQA/test_llava-13b.jsonl \
    --conv-mode llava_v1
(b) Evaluate the generated responses
python eval_science_qa.py \
    --base-dir /path/to/ScienceQA/data/scienceqa \
    --result-file vqa/results/ScienceQA/test_llava-13b.jsonl \
    --output-file vqa/results/ScienceQA/test_llava-13b_output.json \
    --output-result vqa/results/ScienceQA/test_llava-13b_result.json \
For reference, we attach our prediction file test_sqa_llava_lcs_558k_sqa_12e_vicuna_v1_3_13b.json and test_sqa_llava_13b_v0.json for comparison when reproducing our results, as well as for further analysis in detail.

