# How to process video as data loader We assume that video is preprocessed in to image files in advance. Usually, we do not use all frames in a clip but sample a certain duration (e.g. 16 frames). The pipline we assume for each chunk is the following. - Get a list of images paths of clips e.g. ["./video/clip1/frame0.jpg",...,"./video/clip1/frame101.jpg"] - Sample a certain duration we want to use e.g. ["./video/clip1/frame11.jpg",...,"./video/clip1/frame26.jpg"] - Load each frames into a tensor shaped as (T, H, W, C). HW can be changed later. - Use torchvision builtin utilities to crop, flip, etc. For example, - ToTensorVideo() from (T, H, W, C) to (C, T, H, W)), from 0-255 to 0-1 (devide by 225), and from uint8 to float. - CenterCropVideo - RandomHorizontalFlipVideo - NormalizeVideo with kinetics mean and std -See more https://github.com/pytorch/vision/blob/f0d3daa7f65bcde560e242d9bccc284721368f02/torchvision/transforms/transforms_video.py Note that the first part is different from what official pytorch repository ( https://github.com/pytorch/vision/tree/master/references/video_classification ) does. We don't use VideoClip class.