File size: 5,890 Bytes
0f17119 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
.. _config:
Training Models on Task Datasets (Commands and Configurations)
#################################################################
LAVIS provides scripts to pre-train and finetune supported models on standard language-vision tasks, stored at ``lavis/run_scripts/``.
To replicate the experiments, just run these bash scripts. For example, to train BLIP model on the image-text retrieval task with MSCOCO dataset, we can run
.. code-block::
bash run_scripts/blip/train/train_retrieval_coco.sh
Inside the scripts, we can see
.. code-block:: bash
python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/blip/train/retrieval_coco_ft.yaml
where we start a pytorch distributed training on 8 GPUs (you may change according to your own hardware setup). The ``--cfg-path`` specifys a `runtime configuration file`, specifying
the task, model, dataset and training recipes.
Available options and their descriptions are as below.
.. LAVIS executes training and evaluation based on arguments specified in the configuration files. The default model and dataset configurations are defined in ``lavis/configs``. The task-specific configurations are defined in ``lavis/projects``. Task-specific configurations have higher priority over the default configurations.
.. The following tables provide explanations for the arguments in the configuration files.
.. list-table::
:widths: 30 40
:header-rows: 1
* - Model Configurations
- Functionalities
* - arch
- | name of the model from the model zoo
| default: task-dependent
* - model_type
- | the type of the model (e.g., base)
| default: task-dependent
* - load_pretrained
- | load pretrained weights
| default: True (for finetuning task) | False (for pretraining task)
* - load_finetuned
- | load task-specific finetuned weights
| default: False (for finetuning task) | True (for evaluation)
* - pretrained
- | URL or local path which stores the pretrained model, defined in the default model configuration file
| default: task-dependent
* - finetuned
- | URL or local path which stores the finetuned model, defined in the default model configuration file
| default: task-dependent
.. list-table::
:widths: 30 50
:header-rows: 1
* - Dataset Configurations
- Functionalities
* - vis_processor
- | pre-processing of visual input
| default: task-dependent
* - text_processor
- | pre-processing of text input
| default: task-dependent
* - build_info
- | dataset information including the storage location, defined in the default dataset configuration file
| default: task-dependent
.. list-table::
:widths: 30 50
:header-rows: 1
* - Runtime Configurations
- Functionalities
* - task
- | name of the task
| default: task-dependent
* - lr_sched
- | learning rate schedular
| default: linear_warmup_cosine_lr
* - init_lr
- | initial learning rate (after warmup)
| default: task-dependent
* - min_lr
- | final learning rate after decay
| default: task-dependent
* - warmup_lr
- | starting learning rate for warmup
| default: init_lr (no warmup)
* - lr_decay_rate
- | learning rate decay per epoch for step_lr_shedule
| default: 0.9
* - warmup_steps
- | number of steps for learning rate warmup
| default: 0
* - max_epoch
- | total number of training epochs
| default: task-dependent
* - weight_decay
- | weight decay coefficient for the optimizer
| default: 0.05
* - batch_size_train
- | batch size during training
| default: task-dependent
* - batch_size_eval
- | batch size during evaluation
| default: task-dependent
* - seed
- | pseudo random number generator seed
| default: 42
* - output_dir
- | directory to store logs, results and checkpoints
| default: task-dependent
* - resume_ckpt_path
- | path of the checkpoint to resume training from
| default: None
* - evaluate
- | only perform evaluation without training
| default: False
* - train_splits
- | dataset splits used for training
| default: ["train"]
* - valid_splits
- | dataset splits used for validation
| default: ["val"]
* - test
- | dataset splits used for test
| default: ["test"]
* - device
- | use cpu or gpu (cuda)
| default: cuda
* - world_size
- | number of processes participating in the job
| default: 1
* - dist_url
- | URL specifying how to initialize the process group
| default: "env://"
* - distributed
- | use distributed training
| default: True
* - amp
- | use automatic mixed precision training
| default: False
.. list-table::
:widths: 40 50
:header-rows: 1
* - Text Generation Configurations
- Functionalities
* - max_len
- | maximum number of text tokens to generate
| default: 20 (for image captioning)
* - min_len
- | minimum number of text tokens to generate
| default: 5 (for image captioning)
* - num_beams
- | number of beams to perform beam search
| default: 3
.. list-table::
:widths: 40 50
:header-rows: 1
* - Multimodal Retrieval Configurations
- Functionalities
* - negative_all_rank
- | collect negatives from all processes for the image-text matching loss
| default: True (for coco)
* - k_test
- | number of retrieval candidates ranked from contrastive similarity
| default: 256 (for coco)
|