|
# How to Install Datasets |
|
|
|
`$DATA` denotes the location where datasets are installed, e.g. |
|
|
|
``` |
|
$DATA/ |
|
|ββ office31/ |
|
|ββ office_home/ |
|
|ββ visda17/ |
|
``` |
|
|
|
[Domain Adaptation](#domain-adaptation) |
|
- [Office-31](#office-31) |
|
- [Office-Home](#office-home) |
|
- [VisDA17](#visda17) |
|
- [CIFAR10-STL10](#cifar10-stl10) |
|
- [Digit-5](#digit-5) |
|
- [DomainNet](#domainnet) |
|
- [miniDomainNet](#miniDomainNet) |
|
|
|
[Domain Generalization](#domain-generalization) |
|
- [PACS](#pacs) |
|
- [VLCS](#vlcs) |
|
- [Office-Home-DG](#office-home-dg) |
|
- [Digits-DG](#digits-dg) |
|
- [Digit-Single](#digit-single) |
|
- [CIFAR-10-C](#cifar-10-c) |
|
- [CIFAR-100-C](#cifar-100-c) |
|
- [WILDS](#wilds) |
|
|
|
[Semi-Supervised Learning](#semi-supervised-learning) |
|
- [CIFAR10/100 and SVHN](#cifar10100-and-svhn) |
|
- [STL10](#stl10) |
|
|
|
## Domain Adaptation |
|
|
|
### Office-31 |
|
|
|
Download link: https://people.eecs.berkeley.edu/~jhoffman/domainadapt/#datasets_code. |
|
|
|
File structure: |
|
|
|
``` |
|
office31/ |
|
|ββ amazon/ |
|
| |ββ back_pack/ |
|
| |ββ bike/ |
|
| |ββ ... |
|
|ββ dslr/ |
|
| |ββ back_pack/ |
|
| |ββ bike/ |
|
| |ββ ... |
|
|ββ webcam/ |
|
| |ββ back_pack/ |
|
| |ββ bike/ |
|
| |ββ ... |
|
``` |
|
|
|
Note that within each domain folder you need to move all class folders out of the `images/` folder and then delete the `images/` folder. |
|
|
|
### Office-Home |
|
|
|
Download link: http://hemanthdv.org/OfficeHome-Dataset/. |
|
|
|
File structure: |
|
|
|
``` |
|
office_home/ |
|
|ββ art/ |
|
|ββ clipart/ |
|
|ββ product/ |
|
|ββ real_world/ |
|
``` |
|
|
|
### VisDA17 |
|
|
|
Download link: http://ai.bu.edu/visda-2017/. |
|
|
|
The dataset can also be downloaded using our script at `datasets/da/visda17.sh`. Run the following command in your terminal under `Dassl.pytorch/datasets/da`, |
|
|
|
```bash |
|
sh visda17.sh $DATA |
|
``` |
|
|
|
Once the download is finished, the file structure will look like |
|
|
|
``` |
|
visda17/ |
|
|ββ train/ |
|
|ββ test/ |
|
|ββ validation/ |
|
``` |
|
|
|
### CIFAR10-STL10 |
|
|
|
Run the following command in your terminal under `Dassl.pytorch/datasets/da`, |
|
|
|
```bash |
|
python cifar_stl.py $DATA/cifar_stl |
|
``` |
|
|
|
This will create a folder named `cifar_stl` under `$DATA`. The file structure will look like |
|
|
|
``` |
|
cifar_stl/ |
|
|ββ cifar/ |
|
| |ββ train/ |
|
| |ββ test/ |
|
|ββ stl/ |
|
| |ββ train/ |
|
| |ββ test/ |
|
``` |
|
|
|
Note that only 9 classes shared by both datasets are kept. |
|
|
|
### Digit-5 |
|
|
|
Create a folder `$DATA/digit5` and download to this folder the dataset from [here](https://github.com/VisionLearningGroup/VisionLearningGroup.github.io/tree/master/M3SDA/code_MSDA_digit#digit-five-download). This should give you |
|
|
|
``` |
|
digit5/ |
|
|ββ Digit-Five/ |
|
``` |
|
|
|
Then, run the following command in your terminal under `Dassl.pytorch/datasets/da`, |
|
|
|
```bash |
|
python digit5.py $DATA/digit5 |
|
``` |
|
|
|
This will extract the data and organize the file structure as |
|
|
|
``` |
|
digit5/ |
|
|ββ Digit-Five/ |
|
|ββ mnist/ |
|
|ββ mnist_m/ |
|
|ββ usps/ |
|
|ββ svhn/ |
|
|ββ syn/ |
|
``` |
|
|
|
### DomainNet |
|
|
|
Download link: http://ai.bu.edu/M3SDA/. (Please download the cleaned version of split files) |
|
|
|
File structure: |
|
|
|
``` |
|
domainnet/ |
|
|ββ clipart/ |
|
|ββ infograph/ |
|
|ββ painting/ |
|
|ββ quickdraw/ |
|
|ββ real/ |
|
|ββ sketch/ |
|
|ββ splits/ |
|
| |ββ clipart_train.txt |
|
| |ββ clipart_test.txt |
|
| |ββ ... |
|
``` |
|
|
|
### miniDomainNet |
|
|
|
You need to download the DomainNet dataset first. The miniDomainNet's split files can be downloaded at this [google drive](https://drive.google.com/open?id=15rrLDCrzyi6ZY-1vJar3u7plgLe4COL7). After the zip file is extracted, you should have the folder `$DATA/domainnet/splits_mini/`. |
|
|
|
## Domain Generalization |
|
|
|
### PACS |
|
|
|
Download link: [google drive](https://drive.google.com/open?id=1m4X4fROCCXMO0lRLrr6Zz9Vb3974NWhE). |
|
|
|
File structure: |
|
|
|
``` |
|
pacs/ |
|
|ββ images/ |
|
|ββ splits/ |
|
``` |
|
|
|
You do not necessarily have to manually download this dataset. Once you run ``tools/train.py``, the code will detect if the dataset exists or not and automatically download the dataset to ``$DATA`` if missing. This also applies to VLCS, Office-Home-DG, and Digits-DG. |
|
|
|
### VLCS |
|
|
|
Download link: [google drive](https://drive.google.com/file/d/1r0WL5DDqKfSPp9E3tRENwHaXNs1olLZd/view?usp=sharing) (credit to https://github.com/fmcarlucci/JigenDG#vlcs) |
|
|
|
File structure: |
|
|
|
``` |
|
VLCS/ |
|
|ββ CALTECH/ |
|
|ββ LABELME/ |
|
|ββ PASCAL/ |
|
|ββ SUN/ |
|
``` |
|
|
|
### Office-Home-DG |
|
|
|
Download link: [google drive](https://drive.google.com/open?id=1gkbf_KaxoBws-GWT3XIPZ7BnkqbAxIFa). |
|
|
|
File structure: |
|
|
|
``` |
|
office_home_dg/ |
|
|ββ art/ |
|
|ββ clipart/ |
|
|ββ product/ |
|
|ββ real_world/ |
|
``` |
|
|
|
### Digits-DG |
|
|
|
Download link: [google driv](https://drive.google.com/open?id=15V7EsHfCcfbKgsDmzQKj_DfXt_XYp_P7). |
|
|
|
File structure: |
|
|
|
``` |
|
digits_dg/ |
|
|ββ mnist/ |
|
|ββ mnist_m/ |
|
|ββ svhn/ |
|
|ββ syn/ |
|
``` |
|
|
|
### Digit-Single |
|
Follow the steps for [Digit-5](#digit-5) to organize the dataset. |
|
|
|
### CIFAR-10-C |
|
|
|
First download the CIFAR-10-C dataset from https://zenodo.org/record/2535967#.YFxHEWQzb0o to, e.g., $DATA, and extract the file under the same directory. Then, navigate to `Dassl.pytorch/datasets/dg` and run the following command in your terminal |
|
```bash |
|
python cifar_c.py $DATA/CIFAR-10-C |
|
``` |
|
where the first argument denotes the path to the (uncompressed) CIFAR-10-C dataset. |
|
|
|
The script will extract images from the `.npy` files and save them to `cifar10_c/` created under $DATA. The file structure will look like |
|
``` |
|
cifar10_c/ |
|
|ββ brightness/ |
|
| |ββ 1/ # 5 intensity levels in total |
|
| |ββ 2/ |
|
| |ββ 3/ |
|
| |ββ 4/ |
|
| |ββ 5/ |
|
|ββ ... # 19 corruption types in total |
|
``` |
|
|
|
Note that `cifar10_c/` only contains the test images. The training images are the normal CIFAR-10 images. See [CIFAR10/100 and SVHN](#cifar10100-and-svhn) for how to prepare the CIFAR-10 dataset. |
|
|
|
### CIFAR-100-C |
|
|
|
First download the CIFAR-100-C dataset from https://zenodo.org/record/3555552#.YFxpQmQzb0o to, e.g., $DATA, and extract the file under the same directory. Then, navigate to `Dassl.pytorch/datasets/dg` and run the following command in your terminal |
|
```bash |
|
python cifar_c.py $DATA/CIFAR-100-C |
|
``` |
|
where the first argument denotes the path to the (uncompressed) CIFAR-100-C dataset. |
|
|
|
The script will extract images from the `.npy` files and save them to `cifar100_c/` created under $DATA. The file structure will look like |
|
``` |
|
cifar100_c/ |
|
|ββ brightness/ |
|
| |ββ 1/ # 5 intensity levels in total |
|
| |ββ 2/ |
|
| |ββ 3/ |
|
| |ββ 4/ |
|
| |ββ 5/ |
|
|ββ ... # 19 corruption types in total |
|
``` |
|
|
|
Note that `cifar100_c/` only contains the test images. The training images are the normal CIFAR-100 images. See [CIFAR10/100 and SVHN](#cifar10100-and-svhn) for how to prepare the CIFAR-100 dataset. |
|
|
|
### WILDS |
|
|
|
No action is required to preprocess WILDS's datasets. The code will automatically download the data. |
|
|
|
## Semi-Supervised Learning |
|
|
|
### CIFAR10/100 and SVHN |
|
|
|
Run the following command in your terminal under `Dassl.pytorch/datasets/ssl`, |
|
|
|
```bash |
|
python cifar10_cifar100_svhn.py $DATA |
|
``` |
|
|
|
This will create three folders under `$DATA`, i.e. |
|
|
|
``` |
|
cifar10/ |
|
|ββ train/ |
|
|ββ test/ |
|
cifar100/ |
|
|ββ train/ |
|
|ββ test/ |
|
svhn/ |
|
|ββ train/ |
|
|ββ test/ |
|
``` |
|
|
|
### STL10 |
|
|
|
Run the following command in your terminal under `Dassl.pytorch/datasets/ssl`, |
|
|
|
```bash |
|
python stl10.py $DATA/stl10 |
|
``` |
|
|
|
This will create a folder named `stl10` under `$DATA` and extract the data into three folders, i.e. `train`, `test` and `unlabeled`. Then, download from http://ai.stanford.edu/~acoates/stl10/ the "Binary files" and extract it under `stl10`. |
|
|
|
The file structure will look like |
|
|
|
``` |
|
stl10/ |
|
|ββ train/ |
|
|ββ test/ |
|
|ββ unlabeled/ |
|
|ββ stl10_binary/ |
|
``` |