Spaces:
Sleeping
Zero-shot Transfer and Finetuning
(If you are new to the ideas of mmpt.processors
, see README first.)
All finetuning datasets (specifically processors
) are defined in mmpt.processors.dsprocessor
.
Given the complexity of different types of finetuning tasks, each task may have their own meta/video/text/aligner processors and mmpt/evaluators/{Predictor,Metric}
.
Tasks
Currently, we support 5 end datasets: MSRVTT
, Youcook
, COIN
, Crosstask
and DiDeMo
with the following tasks:
text-video retrieval: MSRVTT
, Youcook
, DiDeMo
;
video captioning: Youcook
;
Video Question and Answering: MSRVTT-QA
.
To add your own dataset, you can specify the corresponding processors and config them in the dataset
field of a config file, such as projects/task/vtt.yaml
.
Zero-shot Transfer (no Training)
Zero-shot transfer will run the pre-trained model (e.g., VideoCLIP) directly on testing data. Configs with pattern: projects/task/*_zs_*.yaml
are dedicated for zero-shot transfer.
Fine-tuning
The training of a downstream task is similar to pretraining, execept you may need to specify the restore_file
in fairseq.checkpoint
and reset optimizers, see projects/task/ft.yaml
that is included by projects/task/vtt.yaml
.
We typically do finetuning on 2 gpus (local_small
).
Testing
For each finetuning dataset, you may need to specify a testing config, similar to projects/task/test_vtt.yaml
.
We define mmpt.evaluators.Predictor
for different types of prediction. For example, MSRVTT
and Youcook
are video-retrieval tasks and expecting to use RetrievalPredictor
. You may need to define your new type of predictors and specify that in predictor
field of a testing config.
Each task may also have their own metric for evaluation. This can be created in mmpt.evaluators.Metric
and specified in the metric
field of a testing config.
Launching a testing is as simple as training by specifying the path of a testing config:
python locallaunch.py projects/mfmmlm/test_vtt.yaml
Testing will be launched locally by default since prediction is computationally less expensive.
Third-party Libraries
We list the following finetuning tasks that require third-party libraries.
Youcook captioning: https://github.com/Maluuba/nlg-eval
CrossTask: https://github.com/DmZhukov/CrossTask
's dp
under third-party/CrossTask
(python setup.py build_ext --inplace
)