File size: 5,724 Bytes
7e8784c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
.. _config:

Training Models on Task Datasets (Commands and Configurations) 
#################################################################

LAVIS provides scripts to pre-train and finetune supported models on standard language-vision tasks, stored at ``lavis/run_scripts/``. 
To replicate the experiments, just run these bash scripts. For example, to train BLIP model on the image-text retrieval task with MSCOCO dataset, we can run

.. code-block::

    bash run_scripts/lavis/blip/train/train_retrieval_coco.sh

Inside the scripts, we can see 

.. code-block:: bash

    python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/blip/train/retrieval_coco_ft.yaml

where we start a pytorch distributed training on 8 GPUs (you may change according to your own hardware setup). The ``--cfg-path`` specifys a `runtime configuration file`, specifying
the task, model, dataset and training recipes. 

Available options and their descriptions are as below.

.. LAVIS executes training and evaluation based on arguments specified in the configuration files. The default model and dataset configurations are defined in ``lavis/configs``. The task-specific configurations are defined in ``lavis/projects``. Task-specific configurations have higher priority over the default configurations.

.. The following tables provide explanations for the arguments in the configuration files.

.. list-table::
   :widths: 30 40
   :header-rows: 1

   * - Model Configurations
     - Functionalities
   * - arch
     - | name of the model from the model zoo
       | default: task-dependent
   * - model_type
     - | the type of the model (e.g., base)
       | default: task-dependent
   * - load_pretrained
     - | load pretrained weights
       | default: True (for finetuning task) | False (for pretraining task) 
   * - load_finetuned
     - | load task-specific finetuned weights
       | default: False (for finetuning task) | True (for evaluation) 
   * - pretrained 
     - | URL or local path which stores the pretrained model, defined in the default model configuration file
       | default: task-dependent 
   * - finetuned
     - | URL or local path which stores the finetuned model, defined in the default model configuration file
       | default: task-dependent

.. list-table::
   :widths: 30 50
   :header-rows: 1

   * - Dataset Configurations
     - Functionalities
   * - vis_processor
     - | pre-processing of visual input
       | default: task-dependent
   * - text_processor
     - | pre-processing of text input
       | default: task-dependent
   * - build_info
     - | dataset information including the storage location, defined in the default dataset configuration file
       | default: task-dependent

.. list-table::
   :widths: 30 50
   :header-rows: 1

   * - Runtime Configurations
     - Functionalities
   * - task
     - | name of the task
       | default: task-dependent
   * - lr_sched
     - | learning rate schedular
       | default: linear_warmup_cosine_lr
   * - init_lr
     - | initial learning rate (after warmup)
       | default: task-dependent
   * - min_lr
     - | final learning rate after decay
       | default: task-dependent
   * - warmup_lr
     - | starting learning rate for warmup
       | default: init_lr (no warmup)
   * - lr_decay_rate
     - | learning rate decay per epoch for step_lr_shedule
       | default: 0.9
   * - warmup_steps
     - | number of steps for learning rate warmup
       | default: 0
   * - max_epoch
     - | total number of training epochs
       | default: task-dependent
   * - weight_decay
     - | weight decay coefficient for the optimizer
       | default: 0.05
   * - batch_size_train
     - | batch size during training
       | default: task-dependent
   * - batch_size_eval
     - | batch size during evaluation
       | default: task-dependent
   * - seed
     - | pseudo random number generator seed
       | default: 42
   * - output_dir
     - | directory to store logs, results and checkpoints
       | default: task-dependent
   * - resume_ckpt_path
     - | path of the checkpoint to resume training from
       | default: None
   * - evaluate
     - | only perform evaluation without training
       | default: False
   * - train_splits
     - | dataset splits used for training
       | default: ["train"]
   * - valid_splits
     - | dataset splits used for validation
       | default: ["val"]
   * - test
     - | dataset splits used for test
       | default: ["test"]
   * - device
     - | use cpu or gpu (cuda)
       | default: cuda
   * - world_size
     - | number of processes participating in the job
       | default: 1
   * - dist_url
     - | URL specifying how to initialize the process group
       | default: "env://"
   * - distributed
     - | use distributed training
       | default: True
   * - amp
     - | use automatic mixed precision training
       | default: False

.. list-table::
   :widths: 40 50
   :header-rows: 1

   * - Text Generation Configurations
     - Functionalities
   * - max_len
     - | maximum number of text tokens to generate
       | default: 20 (for image captioning)
   * - min_len
     - | minimum number of text tokens to generate
       | default: 5 (for image captioning)
   * - num_beams
     - | number of beams to perform beam search
       | default: 3

.. list-table::
   :widths: 40 50
   :header-rows: 1

   * - Multimodal Retrieval Configurations
     - Functionalities
   * - negative_all_rank
     - | collect negatives from all processes for the image-text matching loss
       | default: True (for coco)
   * - k_test
     - | number of retrieval candidates ranked from contrastive similarity
       | default: 256 (for coco)