File size: 9,746 Bytes
fd01725
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76b71d1
66b84aa
fd01725
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# Map It Anywhere (MIA) Dataset

## Table of Contents
  - [Introduction](#introduction)
  - [Data](#data)
    - [Dataset Structure](#dataset-structure)
    - [Format](#format)
    - [Dataset Creation Summary](#dataset-creation)
  - [Getting Started](#getting-started)
  - [Licenses](#licenses)

![MIA Dataset Example](/assets/mia_dataset_overview.png "MIA Dataset Example")

![MIA Image Diversity](/assets/fpv_diversity.png "MIA Image Diversity")
## Introduction
The Map It Anywhere (MIA) dataset contains large-scale map-prediction-ready data curated from public datasets. 
Specifically, the dataset empowers Bird's Eye View (BEV) map prediction given First Person View (FPV) RGB images, by providing diversity in location and cameras beyond current datasets. The dataset contains 1.2 million high quality first-person-view (FPV) and bird's eye view (BEV) map pairs covering 470 squared kilometers, which to the best of our knowledge provides 6x more coverage than the closest publicly available map prediction dataset, thereby facilitating future map prediction research on generalizability and robustness. The dataset is curated using our MIA data engine [code](https://github.com/MapItAnywhere/MapItAnywhere) to sample from six urban-centered location: New York, Chicago, Houston, Los Angeles, Pittsburgh, and San Francisco.

Dataset download links are available [here](https://cmu.box.com/s/6tnlvikg1rcsai0ve7t8kgdx9ago9x9q). Please refer to [Getting Started](#getting-started) page on how to use.

## Data
### Dataset Structure

```
ROOT
|
--- LOCATION_0                             # location folder
|       |
|       +--- images                          # FPV Images (XX.jpg)
|       +--- semantic_masks                  # Semantic Masks (XX.npz)
|       +--- flood_fill                      # Visibility Masks (XX.npz)
|       ---- dump.json                       # Camera pose information for IDs in LOCATION
|       ---- image_points.parquet
|       ---- image_metadata.parquet
|       ---- image_metadata_filtered.parquet
|       ---- image_metadata_filtered_processed.parquet 
--- LOCATION_1                             
.
.
|
+-- LOCATION_2
--- README.md
--- samples.pdf # Visualization of sample data
```


## Format

Each data sample has a unique ID given by Mapillary and is used to reference and associate attributes related to the sample throughout the dataset.

**Each location has the following:**

- `images` Directory containing all FPV images named as `<id>_undistorted.png`
- `semantic_masks` npz files named as `<id>` containing semantic masks in the format of a single array `arr_0` with shape 224x224x8 where the 3rd dimension maps to classes as follows:
	0. road
	1. crossing
	2. explicit_pedestrian
	3. park (Unused by Mapper)
	4. building
	5. water (Unused by Mapper)
	6. terrain
	7. parking
	8. train (Unused by Mapper)
- `flood_masks` npz files named as `<id>` containing an observable region mask in the format of a single array `arr_0` with shape 224x224.
- `image_points.parquet` dataframe containing all image points retrieved within the tiles encompassing the boundary.
- `image_metadata.parquet` dataframe including metadata retrieved for each image point retrieved (After boundary filtering). The metadata retrieved is documented in the [Mapillary API](https://www.mapillary.com/developer/api-documentation#image)
- `image_metadata_filtered.parquet` As above but only keeping filtered records
- `image_metadata_filtered_processed.parquet` the final dataframe after FPV processing and spatial filtering and is the one that reflects what to expect in `images` directory.
- `dump.json` a json file containing camera intrinsics and extrinsics for each image taken. Same format as [OrienterNet](https://github.com/facebookresearch/OrienterNet).

In addition `split.json` is a file at the root that describes our training, validation, and testing splits.

**Note** that throughout the pipeline, some data samples are unable to be processed fully due to API issues or processing limitations. Such data samples may have residues in dataframes or split files but may not have corresponding maps or flood masks. Thus, a valid data sample is defined as one that has a corresponding image, metadata record, semantic mask, and flood mask. The invalid data samples are less than 0.001% and will be cleaned up in later versions.


## Dataset Creation

![MIA Curation Pipeline](/assets/mia_curation.png "MIA Curation Pipeline")
**Overview of how MIA data engine enables automatic curation of FPV & BEV data.**
Given names of cities as input from the left, the top row shows FPV processing, while the bottom row depicts BEV processing. Both pipelines converge on the right, producing FPV, BEV, and pose tuples. For more information, please reference the main paper.

### Curation Rationale

The MIA data engine and dataset were created to accelerate research progress towards anywhere map prediction. Current map prediction research builds on only a few map prediction datasets released by autonomous vehicle companies, which cover very limited area. We therefore present the MIA data engine, a more scalable approach by sourcing from large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps. 


### Source Data

The MIA dataset includes data from two sources: [Mapillary](https://www.mapillary.com/) for First-Person-View (FPV) images, and [OpenStreetMap](https://www.openstreetmap.org) for Bird-Eye-View (BEV) maps. 

For FPV retrieval, we leverage Mapillary, a massive public database, licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/), with over 2 billion crowd-sourced images. The images span various weather and lighting conditions collected using diverse camera models and focal lengths. Furthermore, images are taken by pedestrians, vehicles, bicyclists, etc. This diversity enables the collection of more dynamic and difficult scenarios critical for anywhere map prediction.
When uploading to the Mapillary platform, users submit them under Mapillary's terms and all images shared are under a CC-BY-SA license, more details can be found in [Mapillary License Page](https://help.mapillary.com/hc/en-us/articles/115001770409-Licenses).
In addition, Mapillary integrates several mechanisms to minimize privacy concerns, such as applying technology to blur any faces and license plates, requiring users to notify if they observe any imageries that may contain personal data. More information can be found on the [Mapillary Privacy Policy page](https://www.mapillary.com/privacy).

For BEV retrieval, we leverage OpenStreetMap (OSM), a global crowd-sourced mapping platform open-sourced under [Open Data Commons Open Database License (ODbL)](https://opendatacommons.org/licenses/odbl/). OSM provides
rich vectorized annotations for streets, sidewalks, buildings, etc. OpenStreetMap has limitations on mapping private information where "it violates the privacy
of people living in this world", with guidelines found [here](https://wiki.openstreetmap.org/wiki/Limitations_on_mapping_private_information).


### Bias, Risks, and Limitations

While we show promising generalization performance on conventional datasets, we note that label noise inherently exists, to a higher degree 
than manually collected data, in crowd sourced data, in both pose correspondence, and in BEV map labeling. Such noise is common across large-scale
automatically scraped/curated benchmarks such as ImageNet. While we recognize that our sampled dataset is biased towards locations in the US, our MIA data engine is
applicable to other world-wide locations.
Our work relies heavily on crowd sourced data putting the burden of data collection on people and open-source contributions.


## Getting Started
1. [Download the dataset](https://cmu.box.com/s/6tnlvikg1rcsai0ve7t8kgdx9ago9x9q).
2. Unzip all locations of interest into the same structure described above, such that a root folder contains all location folders directly.
3. (Optional) Verify your download by visualizing a few samples using the tool `mia/misc_tools/vis_samples.py`. 
	1. Build the docker image `mia/Dockerfile` if you haven't already by running: 

		    docker build -t mia:release mia
	2. Launch the container while mounting your dataset root folder as well as this repository

			docker run -v <PATH_TO_DATASET_ROOT>:/home/mia_dataset_release -v <PATH_TO_THIS_REPO>:/home/MapItAnywhere --network=bridge -it mia:release
	3. From inside the container run:
		
		   cd /home/MapItAnywhere

           python3.9 -m mia.misc_tools.vis_samples --dataset_dir /home/mia_dataset_release --locations pittsburgh

	If successful, the script will generate a PDF called `compare.pdf` in the pittsburgh directory. Upon openning you should see the metadata, FPVs, and BEVs of a few samples of the dataset. Note that satellite imagery is not provided as part of the dataset and is only used for comparison purposes.

4. Enjoy and explore! Don't hesitate to raise a GitHub issue if you encounter any problems.

Samples and key metadata information in `compare.pdf` will look like the following:
![MIA Sample](/assets/sample_snippet.png "MIA Sample")

## Licenses
The FPVs were curated and processed from Mapillary and have the same [CC by SA license](https://creativecommons.org/licenses/by-sa/4.0/deed.en). These include all images files, parquet dataframes, and dump.json.
The BEVs were curated and processed from OpenStreetMap and has the same [Open Data Commons Open Database  (ODbL)  License](https://opendatacommons.org/licenses/odbl/). These include all semantic masks and flood masks.
The rest of the data is licensed under [CC by SA license](https://creativecommons.org/licenses/by-sa/4.0/deed.en).