File size: 6,386 Bytes
d3dbf03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# Customize Data Pipeline

In this tutorial, we will introduce some methods about how to build the data pipeline (i.e., data transformations) for your tasks.

- [Customize Data Pipeline](#customize-data-pipeline)
  - [Design of Data Pipeline](#design-of-data-pipeline)
  - [Modify the Training/Testing Pipeline](#modify-the-trainingtest-pipeline)
    - [Loading](#loading)
    - [Sampling Frames and Other Processing](#sampling-frames-and-other-processing)
    - [Formatting](#formatting)
  - [Add New Data Transforms](#add-new-data-transforms)

## Design of Data Pipeline

The data pipeline refers to the procedure of handling the data sample dict when indexing a sample from the dataset, and comprises a series of data transforms. Each data transform accepts a `dict` as input, processes it, and produces a `dict` as output for the subsequent data transform in the sequence.

Below is an example data pipeline for training SlowFast on Kinetics using `VideoDataset`. The pipeline initially employs [`decord`](https://github.com/dmlc/decord) to read the raw videos and randomly sample one video clip, which comprises `32` frames with a frame interval of `2`. Subsequently, it applies random resized crop and random horizontal flip to all frames before formatting the data shape as `NCTHW`, which is `(1, 3, 32, 224, 224)` in this example.

```python

train_pipeline = [

    dict(type='DecordInit',),

    dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1),

    dict(type='DecordDecode'),

    dict(type='Resize', scale=(-1, 256)),

    dict(type='RandomResizedCrop'),

    dict(type='Resize', scale=(224, 224), keep_ratio=False),

    dict(type='Flip', flip_ratio=0.5),

    dict(type='FormatShape', input_format='NCTHW'),

    dict(type='PackActionInputs')

]

```

A comprehensive list of all available data transforms in MMAction2 can be found in the [mmaction.datasets.transforms](mmaction.datasets.transforms).

## Modify the Training/Testing Pipeline

The data pipeline in MMAction2 is highly adaptable, as nearly every step of the data preprocessing can be configured from the config file. However, the wide array of options may be overwhelming for some users.

Below are some general practices and guidance for building a data pipeline for action recognition tasks.

### Loading

At the beginning of a data pipeline, it is customary to load videos. However, if the frames have already been extracted, you should utilize `RawFrameDecode` and modify the dataset type to `RawframeDataset`.

```python

train_pipeline = [

    dict(type='SampleFrames', clip_len=32, frame_interval=2, num_clips=1),

    dict(type='RawFrameDecode'),

    dict(type='Resize', scale=(-1, 256)),

    dict(type='RandomResizedCrop'),

    dict(type='Resize', scale=(224, 224), keep_ratio=False),

    dict(type='Flip', flip_ratio=0.5),

    dict(type='FormatShape', input_format='NCTHW'),

    dict(type='PackActionInputs')

]

```

If you need to load data from files with distinct formats (e.g., `pkl`, `bin`, etc.) or from specific locations, you may create a new loading transform and include it at the beginning of the data pipeline. Please refer to [Add New Data Transforms](#add-new-data-transforms) for more details.

### Sampling Frames and Other Processing

During training and testing, we may have different strategies to sample frames from the video.

For instance, when testing SlowFast, we uniformly sample multiple clips as follows:

```python

test_pipeline = [

    ...

    dict(

        type='SampleFrames',

        clip_len=32,

        frame_interval=2,

        num_clips=10,

        test_mode=True),

    ...

]

```

In the above example, 10 video clips, each comprising 32 frames, will be uniformly sampled from each video. `test_mode=True` is employed to accomplish this, as opposed to random sampling during training.

Another example involves `TSN/TSM` models, which sample multiple segments from the video:

```python

train_pipeline = [

    ...

    dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8),

    ...

]

```

Typically, the data augmentations in the data pipeline handles only video-level transforms, such as resizing or cropping, but not transforms like video normalization or mixup/cutmix. This is because we can do video normalization and mixup/cutmix on batched video data
to accelerate processing using GPUs. To configure video normalization and mixup/cutmix, please use the [mmaction.models.utils.data_preprocessor](mmaction.models.utils.data_preprocessor).

### Formatting

Formatting involves collecting training data from the data information dict and converting it into a format that is compatible with the model.

In most cases, you can simply employ [`PackActionInputs`](mmaction.datasets.transforms.PackActionInputs), and it will
convert the image in `NumPy Array` format to `PyTorch Tensor`, and pack the ground truth category information and
other meta information as a dict-like object [`ActionDataSample`](mmaction.structures.ActionDataSample).

```python

train_pipeline = [

    ...

    dict(type='PackActionInputs'),

]

```

## Add New Data Transforms

1. To create a new data transform, write a new transform class in a python file named, for example, `my_transforms.py`. The data transform classes must inherit
   the [`mmcv.transforms.BaseTransform`](mmcv.transforms.BaseTransform) class and override the `transform` method which takes a `dict` as input and returns a `dict`. Finally, place `my_transforms.py` in the folder `mmaction/datasets/transforms/`.

   ```python

   from mmcv.transforms import BaseTransform

   from mmaction.datasets import TRANSFORMS



   @TRANSFORMS.register_module()

   class MyTransform(BaseTransform):

        def __init__(self, msg):

            self.msg = msg



       def transform(self, results):

           # Modify the data information dict `results`.

           print(msg, 'MMAction2.')

           return results

   ```

2. Import the new class in the `mmaction/datasets/transforms/__init__.py`.

   ```python

   ...

   from .my_transform import MyTransform



   __all__ = [

       ..., 'MyTransform'

   ]

   ```

3. Use it in config files.

   ```python

   train_pipeline = [

       ...

       dict(type='MyTransform', msg='Hello!'),

       ...

   ]

   ```