File size: 7,577 Bytes
8c6b5ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
# How to Install Datasets

`$DATA` denotes the location where datasets are installed, e.g.

```
$DATA/
|–– office31/
|–– office_home/
|–– visda17/
```

[Domain Adaptation](#domain-adaptation)
- [Office-31](#office-31)
- [Office-Home](#office-home)
- [VisDA17](#visda17)
- [CIFAR10-STL10](#cifar10-stl10)
- [Digit-5](#digit-5)
- [DomainNet](#domainnet)
- [miniDomainNet](#miniDomainNet)

[Domain Generalization](#domain-generalization)
- [PACS](#pacs)
- [VLCS](#vlcs)
- [Office-Home-DG](#office-home-dg)
- [Digits-DG](#digits-dg)
- [Digit-Single](#digit-single)
- [CIFAR-10-C](#cifar-10-c)
- [CIFAR-100-C](#cifar-100-c)
- [WILDS](#wilds)

[Semi-Supervised Learning](#semi-supervised-learning)
- [CIFAR10/100 and SVHN](#cifar10100-and-svhn)
- [STL10](#stl10)

## Domain Adaptation

### Office-31

Download link: https://people.eecs.berkeley.edu/~jhoffman/domainadapt/#datasets_code.

File structure:

```
office31/
|–– amazon/
|   |–– back_pack/
|   |–– bike/
|   |–– ...
|–– dslr/
|   |–– back_pack/
|   |–– bike/
|   |–– ...
|–– webcam/
|   |–– back_pack/
|   |–– bike/
|   |–– ...
```

Note that within each domain folder you need to move all class folders out of the `images/` folder and then delete the `images/` folder.

### Office-Home

Download link: http://hemanthdv.org/OfficeHome-Dataset/.

File structure:

```
office_home/
|–– art/
|–– clipart/
|–– product/
|–– real_world/
```

### VisDA17

Download link: http://ai.bu.edu/visda-2017/.

The dataset can also be downloaded using our script at `datasets/da/visda17.sh`. Run the following command in your terminal under `Dassl.pytorch/datasets/da`,

```bash
sh visda17.sh $DATA
```

Once the download is finished, the file structure will look like

```
visda17/
|–– train/
|–– test/
|–– validation/
```

### CIFAR10-STL10

Run the following command in your terminal under `Dassl.pytorch/datasets/da`,

```bash
python cifar_stl.py $DATA/cifar_stl
```

This will create a folder named `cifar_stl` under `$DATA`. The file structure will look like

```
cifar_stl/
|–– cifar/
|   |–– train/
|   |–– test/
|–– stl/
|   |–– train/
|   |–– test/
```

Note that only 9 classes shared by both datasets are kept.

### Digit-5

Create a folder `$DATA/digit5` and download to this folder the dataset from [here](https://github.com/VisionLearningGroup/VisionLearningGroup.github.io/tree/master/M3SDA/code_MSDA_digit#digit-five-download). This should give you

```
digit5/
|–– Digit-Five/
```

Then, run the following command in your terminal under `Dassl.pytorch/datasets/da`,

```bash 
python digit5.py $DATA/digit5
```

This will extract the data and organize the file structure as

```
digit5/
|–– Digit-Five/
|–– mnist/
|–– mnist_m/
|–– usps/
|–– svhn/
|–– syn/
```

### DomainNet

Download link: http://ai.bu.edu/M3SDA/. (Please download the cleaned version of split files)

File structure:

```
domainnet/
|–– clipart/
|–– infograph/
|–– painting/
|–– quickdraw/
|–– real/
|–– sketch/
|–– splits/
|   |–– clipart_train.txt
|   |–– clipart_test.txt
|   |–– ...
```

### miniDomainNet

You need to download the DomainNet dataset first. The miniDomainNet's split files can be downloaded at this [google drive](https://drive.google.com/open?id=15rrLDCrzyi6ZY-1vJar3u7plgLe4COL7). After the zip file is extracted, you should have the folder `$DATA/domainnet/splits_mini/`.

## Domain Generalization

### PACS

Download link: [google drive](https://drive.google.com/open?id=1m4X4fROCCXMO0lRLrr6Zz9Vb3974NWhE).

File structure:

```
pacs/
|–– images/
|–– splits/
```

You do not necessarily have to manually download this dataset. Once you run ``tools/train.py``, the code will detect if the dataset exists or not and automatically download the dataset to ``$DATA`` if missing. This also applies to VLCS, Office-Home-DG, and Digits-DG.

### VLCS

Download link: [google drive](https://drive.google.com/file/d/1r0WL5DDqKfSPp9E3tRENwHaXNs1olLZd/view?usp=sharing) (credit to https://github.com/fmcarlucci/JigenDG#vlcs)

File structure:

```
VLCS/
|–– CALTECH/
|–– LABELME/
|–– PASCAL/
|–– SUN/
```

### Office-Home-DG

Download link: [google drive](https://drive.google.com/open?id=1gkbf_KaxoBws-GWT3XIPZ7BnkqbAxIFa).

File structure:

```
office_home_dg/
|–– art/
|–– clipart/
|–– product/
|–– real_world/
```

### Digits-DG

Download link: [google driv](https://drive.google.com/open?id=15V7EsHfCcfbKgsDmzQKj_DfXt_XYp_P7).

File structure:

```
digits_dg/
|–– mnist/
|–– mnist_m/
|–– svhn/
|–– syn/
```

### Digit-Single
Follow the steps for [Digit-5](#digit-5) to organize the dataset.

### CIFAR-10-C

First download the CIFAR-10-C dataset from https://zenodo.org/record/2535967#.YFxHEWQzb0o to, e.g., $DATA, and extract the file under the same directory. Then, navigate to `Dassl.pytorch/datasets/dg` and run the following command in your terminal
```bash
python cifar_c.py $DATA/CIFAR-10-C
```
where the first argument denotes the path to the (uncompressed) CIFAR-10-C dataset.

The script will extract images from the `.npy` files and save them to `cifar10_c/` created under $DATA. The file structure will look like
```
cifar10_c/
|–– brightness/
|   |–– 1/ # 5 intensity levels in total
|   |–– 2/
|   |–– 3/
|   |–– 4/
|   |–– 5/
|–– ... # 19 corruption types in total
```

Note that `cifar10_c/` only contains the test images. The training images are the normal CIFAR-10 images. See [CIFAR10/100 and SVHN](#cifar10100-and-svhn) for how to prepare the CIFAR-10 dataset.

### CIFAR-100-C

First download the CIFAR-100-C dataset from https://zenodo.org/record/3555552#.YFxpQmQzb0o to, e.g., $DATA, and extract the file under the same directory. Then, navigate to `Dassl.pytorch/datasets/dg` and run the following command in your terminal
```bash
python cifar_c.py $DATA/CIFAR-100-C
```
where the first argument denotes the path to the (uncompressed) CIFAR-100-C dataset.

The script will extract images from the `.npy` files and save them to `cifar100_c/` created under $DATA. The file structure will look like
```
cifar100_c/
|–– brightness/
|   |–– 1/ # 5 intensity levels in total
|   |–– 2/
|   |–– 3/
|   |–– 4/
|   |–– 5/
|–– ... # 19 corruption types in total
```

Note that `cifar100_c/` only contains the test images. The training images are the normal CIFAR-100 images. See [CIFAR10/100 and SVHN](#cifar10100-and-svhn) for how to prepare the CIFAR-100 dataset.

### WILDS

No action is required to preprocess WILDS's datasets. The code will automatically download the data.

## Semi-Supervised Learning

### CIFAR10/100 and SVHN

Run the following command in your terminal under `Dassl.pytorch/datasets/ssl`,

```bash
python cifar10_cifar100_svhn.py $DATA
```

This will create three folders under `$DATA`, i.e.

```
cifar10/
|–– train/
|–– test/
cifar100/
|–– train/
|–– test/
svhn/
|–– train/
|–– test/
```

### STL10

Run the following command in your terminal under `Dassl.pytorch/datasets/ssl`,

```bash
python stl10.py $DATA/stl10
```

This will create a folder named `stl10` under `$DATA` and extract the data into three folders, i.e. `train`, `test` and `unlabeled`. Then, download from http://ai.stanford.edu/~acoates/stl10/ the "Binary files" and extract it under `stl10`.

The file structure will look like

```
stl10/
|–– train/
|–– test/
|–– unlabeled/
|–– stl10_binary/
```