d22cs051's picture
retriying pushing the code
8273cb9

Zero-shot Transfer and Finetuning

(If you are new to the ideas of mmpt.processors, see README first.) All finetuning datasets (specifically processors) are defined in mmpt.processors.dsprocessor. Given the complexity of different types of finetuning tasks, each task may have their own meta/video/text/aligner processors and mmpt/evaluators/{Predictor,Metric}.

Tasks

Currently, we support 5 end datasets: MSRVTT, Youcook, COIN, Crosstask and DiDeMo with the following tasks:
text-video retrieval: MSRVTT, Youcook, DiDeMo;
video captioning: Youcook;
Video Question and Answering: MSRVTT-QA.

To add your own dataset, you can specify the corresponding processors and config them in the dataset field of a config file, such as projects/task/vtt.yaml.

Zero-shot Transfer (no Training)

Zero-shot transfer will run the pre-trained model (e.g., VideoCLIP) directly on testing data. Configs with pattern: projects/task/*_zs_*.yaml are dedicated for zero-shot transfer.

Fine-tuning

The training of a downstream task is similar to pretraining, execept you may need to specify the restore_file in fairseq.checkpoint and reset optimizers, see projects/task/ft.yaml that is included by projects/task/vtt.yaml.

We typically do finetuning on 2 gpus (local_small).

Testing

For each finetuning dataset, you may need to specify a testing config, similar to projects/task/test_vtt.yaml.

We define mmpt.evaluators.Predictor for different types of prediction. For example, MSRVTT and Youcook are video-retrieval tasks and expecting to use RetrievalPredictor. You may need to define your new type of predictors and specify that in predictor field of a testing config.

Each task may also have their own metric for evaluation. This can be created in mmpt.evaluators.Metric and specified in the metric field of a testing config.

Launching a testing is as simple as training by specifying the path of a testing config: python locallaunch.py projects/mfmmlm/test_vtt.yaml Testing will be launched locally by default since prediction is computationally less expensive.

Third-party Libraries

We list the following finetuning tasks that require third-party libraries.

Youcook captioning: https://github.com/Maluuba/nlg-eval

CrossTask: https://github.com/DmZhukov/CrossTask's dp under third-party/CrossTask (python setup.py build_ext --inplace)