## Training of MiniGPT-4 The training of MiniGPT-4 contains two alignment stages. **1. First pretraining stage** In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets to align the vision and language model. To download and prepare the datasets, please check our [first stage dataset preparation instruction](dataset/README_1_STAGE.md). After the first stage, the visual features are mapped and can be understood by the language model. To launch the first stage training, run the following command. In our experiments, we use 4 A100. You can change the save path in the config file [train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml) ```bash torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml ``` A MiniGPT-4 checkpoint with only stage one training can be downloaded [here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link). Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently. **2. Second finetuning stage** In the second stage, we use a small high quality image-text pair dataset created by ourselves and convert it to a conversation format to further align MiniGPT-4. To download and prepare our second stage dataset, please check our [second stage dataset preparation instruction](dataset/README_2_STAGE.md). To launch the second stage alignment, first specify the path to the checkpoint file trained in stage 1 in [train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage2_finetune.yaml). You can also specify the output path there. Then, run the following command. In our experiments, we use 1 A100. ```bash torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml ``` After the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly.