mvsoom's picture
Upload folder using huggingface_hub
3133fdb
|
raw
history blame
6.03 kB
## Data Preparation
### Kinetics
For more information about Kinetics dataset, please refer the official [website](https://deepmind.com/research/open-source/kinetics). You can take the following steps to prepare the dataset:
1. Download the videos via the official [scripts](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics).
2. Preprocess the downloaded videos by resizing to the short edge size of 256.
3. Prepare the csv files for training, validation, and testing set as `train.csv`, `val.csv`, `test.csv`. The format of the csv file is:
```
path_to_video_1 label_1
path_to_video_2 label_2
path_to_video_3 label_3
...
path_to_video_N label_N
```
All the Kinetics models in the Model Zoo are trained and tested with the same data as [Non-local Network](https://github.com/facebookresearch/video-nonlocal-net/blob/main/DATASET.md) and [PySlowFast](https://github.com/facebookresearch/SlowFast/blob/main/slowfast/datasets/DATASET.md). For dataset specific issues, please reach out to the [dataset provider](https://deepmind.com/research/open-source/kinetics).
### Charades
We follow [PySlowFast](https://github.com/facebookresearch/SlowFast/blob/main/slowfast/datasets/DATASET.md) to prepare the Charades dataset as follow:
1. Download the Charades RGB frames from [official website](http://ai2-website.s3.amazonaws.com/data/Charades_v1_rgb.tar).
2. Download the *frame list* from the following links: ([train](https://dl.fbaipublicfiles.com/pyslowfast/dataset/charades/frame_lists/train.csv), [val](https://dl.fbaipublicfiles.com/pyslowfast/dataset/charades/frame_lists/val.csv)).
### Something-Something V2
We follow [PySlowFast](https://github.com/facebookresearch/SlowFast/blob/main/slowfast/datasets/DATASET.md) to prepare the Something-Something V2 dataset as follow:
1. Download the dataset and annotations from [official website](https://20bn.com/datasets/something-something).
2. Download the *frame list* from the following links: ([train](https://dl.fbaipublicfiles.com/pyslowfast/dataset/ssv2/frame_lists/train.csv), [val](https://dl.fbaipublicfiles.com/pyslowfast/dataset/ssv2/frame_lists/val.csv)).
3. Extract the frames from downloaded videos at 30 FPS. We used ffmpeg-4.1.3 with command:
```
ffmpeg -i "${video}" -r 30 -q:v 1 "${out_name}"
```
4. The extracted frames should be organized to be consistent with the paths in frame lists.
### AVA (Actions V2.2)
The AVA Dataset could be downloaded from the [official site](https://research.google.com/ava/download.html#ava_actions_download)
We followed the same [downloading and preprocessing procedure](https://github.com/facebookresearch/video-long-term-feature-banks/blob/main/DATASET.md) as the [Long-Term Feature Banks for Detailed Video Understanding](https://arxiv.org/abs/1812.05038) do.
You could follow these steps to download and preprocess the data:
1. Download videos
```
DATA_DIR="../../data/ava/videos"
if [[ ! -d "${DATA_DIR}" ]]; then
echo "${DATA_DIR} doesn't exist. Creating it.";
mkdir -p ${DATA_DIR}
fi
wget https://s3.amazonaws.com/ava-dataset/annotations/ava_file_names_trainval_v2.1.txt
for line in $(cat ava_file_names_trainval_v2.1.txt)
do
wget https://s3.amazonaws.com/ava-dataset/trainval/$line -P ${DATA_DIR}
done
```
2. Cut each video from its 15th to 30th minute. AVA has valid annotations only in this range.
```
IN_DATA_DIR="../../data/ava/videos"
OUT_DATA_DIR="../../data/ava/videos_15min"
if [[ ! -d "${OUT_DATA_DIR}" ]]; then
echo "${OUT_DATA_DIR} doesn't exist. Creating it.";
mkdir -p ${OUT_DATA_DIR}
fi
for video in $(ls -A1 -U ${IN_DATA_DIR}/*)
do
out_name="${OUT_DATA_DIR}/${video##*/}"
if [ ! -f "${out_name}" ]; then
ffmpeg -ss 900 -t 901 -i "${video}" "${out_name}"
fi
done
```
3. Extract frames
```
IN_DATA_DIR="../../data/ava/videos_15min"
OUT_DATA_DIR="../../data/ava/frames"
if [[ ! -d "${OUT_DATA_DIR}" ]]; then
echo "${OUT_DATA_DIR} doesn't exist. Creating it.";
mkdir -p ${OUT_DATA_DIR}
fi
for video in $(ls -A1 -U ${IN_DATA_DIR}/*)
do
video_name=${video##*/}
if [[ $video_name = *".webm" ]]; then
video_name=${video_name::-5}
else
video_name=${video_name::-4}
fi
out_video_dir=${OUT_DATA_DIR}/${video_name}/
mkdir -p "${out_video_dir}"
out_name="${out_video_dir}/${video_name}_%06d.jpg"
ffmpeg -i "${video}" -r 30 -q:v 1 "${out_name}"
done
```
4. Download annotations
```
DATA_DIR="../../data/ava/annotations"
if [[ ! -d "${DATA_DIR}" ]]; then
echo "${DATA_DIR} doesn't exist. Creating it.";
mkdir -p ${DATA_DIR}
fi
wget https://research.google.com/ava/download/ava_v2.2.zip -P ${DATA_DIR}
unzip -q ${DATA_DIR}/ava_v2.2.zip -d ${DATA_DIR}
```
5. Download "frame lists" ([train](https://dl.fbaipublicfiles.com/video-long-term-feature-banks/data/ava/frame_lists/train.csv), [val](https://dl.fbaipublicfiles.com/video-long-term-feature-banks/data/ava/frame_lists/val.csv)) and put them in
the `frame_lists` folder (see structure above).
6. Download person boxes that are generated using a person detector trained on AVA - ([train](https://dl.fbaipublicfiles.com/pytorchvideo/data/ava/ava_detection_test.csv), [val](https://dl.fbaipublicfiles.com/pytorchvideo/data/ava/ava_detection_val.csv), [test](https://dl.fbaipublicfiles.com/pytorchvideo/data/ava/ava_detection_test.csv)) and put them in the `annotations` folder (see structure above). Copy files to the annotations directory mentioned in step 4.
If you prefer to use your own person detector, please generate detection predictions files in the suggested format in step 6.
Download the ava dataset with the following structure:
```
ava
|_ frames
| |_ [video name 0]
| | |_ [video name 0]_000001.jpg
| | |_ [video name 0]_000002.jpg
| | |_ ...
| |_ [video name 1]
| |_ [video name 1]_000001.jpg
| |_ [video name 1]_000002.jpg
| |_ ...
|_ frame_lists
| |_ train.csv
| |_ val.csv
|_ annotations
|_ [official AVA annotation files]
|_ ava_train_predicted_boxes.csv
|_ ava_val_predicted_boxes.csv
```