Synthetic Datasets
Introduction
Welcome to the fascinating world of synthetic datasets in computer vision! As we’ve transitioned from classical unsupervised methods to advanced deep learning techniques, the demand for extensive and diverse datasets has skyrocketed. Synthetic datasets have emerged as a pivotal resource in training state-of-the-art models, providing an abundance of data that’s often impractical or impossible to collect in the real world. In this section, we’ll explore some of the most influential synthetic datasets, their applications, and how they’re shaping the future of computer vision.
Low-Level Computer Vision Problems
Optical Flow and Motion Analysis
Optical flow and motion analysis are critical in understanding image dynamics. Here are some datasets that have significantly contributed to advancements in this area:
Dataset Name | Year | Description | Paper | Additional Links |
---|---|---|---|---|
Middlebury | 2021 (latest release) | The Middlebury Stereo dataset consists of high-resolution stereo sequences with complex geometry and pixel-accurate ground-truth disparity data. The ground-truth disparities are acquired using a technique that employs structured lighting and does not require the calibration of the light projectors. | A database and evaluation method for Optical Flow (Cited by 3192 at the time of writing) | Papers with Code - Website |
Playing for Benchmarks | 2017 | more than 250K high-resolution video frames, all annotated with ground-truth data for high level tasks but also for low-level tasks like optical flow estimation and visual odometry. | Playing for benchmarks | Website |
MPI-Sintel | 2012 | A synthetic dataset for optical flow. The main characteristic feature of MPI-Sintel is that it contains the same scenes with different render settings, varying quality and complexity; this approach can provide a deeper understanding of where different optical flow algorithms break down. (paper quote) | A Naturalistic Open Source Movie for Optical Flow Evaluation (551 citations at time of writing) | Website |
Stereo Image Matching
Stereo image matching involves identifying corresponding elements in different images of the same scene. The following datasets have been instrumental in this field:
Name | Year | Description | Paper | Additional Links |
---|---|---|---|---|
Flying Chairs | 2015 | 22k frame pairs with ground truth flow | Learning optical flow with convolutional networks. | |
Flying Chairs 3D | 2015 | 22k stereo frames | A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. | |
Driving | 2015 | 4392 stereo frames | A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. | |
Monkaa | 2015 | 8591 stereo frames | A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. | |
Middlebury 2014 | 2014 | 33 high-resolution stereo datasets | High-resolution stereo datasets with subpixel-accurate ground truth | |
Tsukuba Stereo | 2012 | This dataset includes 1800 stereo pairs accompanied by ground truth disparity maps, occlusion maps, and discontinuity maps. | Towards a simulation-driven stereo vision system | Project |
High-Level Computer Vision Problems
Semantic Segmentation for Autonomous Driving
Semantic segmentation is vital for autonomous vehicles to interpret and navigate their surroundings safely. These datasets provide rich, annotated data for this purpose:
Name | Year | Description | Paper | Additional Links | |
---|---|---|---|---|---|
Virtual KITTI 2 | 2020 | Virtual Worlds as Proxy for Multi-Object Tracking Analysis | Virtual KITTI 2 | Website | |
ApolloScape | 2019 | Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labeling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labeling, lane-mark labeling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities, and daytimes | The ApolloScape Open Dataset for Autonomous Driving and its Application | Website | |
Driving in the Matrix | 2017 | The core idea behind “Driving in the Matrix” is to use photo-realistic computer-generated images from a simulation engine to produce annotated data quickly. | Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks? | GitHub | |
CARLA | 2017 | CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation). | CARLA: An Open Urban Driving Simulator | Website | |
Synthia | 2016 | A large collection of synthetic images for semantic segmentation of urban scenes. SYNTHIA consists of a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes: misc, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist, lane-marking. | The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes | Website | |
GTA5 | 2016 | The GTA5 dataset contains 24966 synthetic images with pixel-level semantic annotation. The images have been rendered using the open-world video game Grand Theft Auto 5 and are all from the car perspective in the streets of American-style virtual cities. 19 semantic classes are compatible with the ones of the Cityscapes dataset. | Playing for Data: Ground Truth from Computer Games | BitBucket | |
ProcSy | A synthetic dataset for semantic segmentation, modeled on a real-world urban environment and features a range of variable influence factors, such as weather and lighting. | ProcSy: Procedural Synthetic Dataset Generation Towards Influence Factor Studies Of Semantic Segmentation Networks | Website |
Indoor Simulation and Navigation
Navigating indoor environments can be challenging due to their complexity. These datasets aid in developing systems capable of indoor simulation and navigation:
Name | Year | Description | Paper | Additional Links |
---|---|---|---|---|
Habitat | 2023 | An Embodied AI simulation platform for studying collaborative human-robot interaction tasks in home environments. | HABITAT 3.0: A CO-HABITAT FOR HUMANS, AVATARS AND ROBOTS | Website |
Minos | 2017 | Multimodal Indoor Simulator | MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments | GitHub |
House3D | 2017 (archived in 2021) | A Rich and Realistic 3D Environment | Building generalisable agents with a realistic and rich 3D environment | GitHub |
Human Action Recognition and Simulation
Recognizing and simulating human actions is a complex task that these datasets help to address:
Name | Year | Description | Paper | Additional Links |
---|---|---|---|---|
PHAV | 2017 | Synthetic dataset of procedurally generated human action recognition videos. | Procedural Generation of Videos to Train Deep Action Recognition Networks | Website |
Surreal | 2017 | (change description - this is for human depth estimation and human part segmentation) Large-scale dataset with synthetically generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth poses, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. | Learning from Synthetic Humans | GitHub - Website |
Face Recognition
Face recognition technology has numerous applications, from security to user identification. Here’s a look at datasets that drive innovations in this field:
Name | Year | Description | Paper | Additional Links |
---|---|---|---|---|
FaceSynthetics | 2021 | The Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels. | Fake It Till You Make It: Face Analysis in the Wild Using Synthetic Data Alone | Website - GitHub |
FFHQ | 2018 | consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains considerable variation in terms of age, ethnicity and image background. | A Style-Based Generator Architecture for Generative Adversarial Networks | GitHub |
3D Shape Modeling from single images
Creating 3D models from single images is a challenging yet exciting area. These datasets are at the forefront of research in 3D shape modeling:
Name | Year | Description | Paper |
---|---|---|---|
Pix3D | 2018 | A large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, and viewpoint estimation. | Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling |
Diverse Applications
The following datasets are either tailored for niche applications or cover multiple ones:
Dataset Name | Release Year | Description | Paper | External Links | Applications |
---|---|---|---|---|---|
CIFAKE | 2023 | CIFAKE is a dataset that contains 60,000 synthetically generated images and 60,000 real images (collected from CIFAR-10). | CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images | Kaggle | Real-Fake Images Classification |
ABO | 2022 | ABO is a large-scale dataset designed for material prediction and multi-view retrieval experiments. The dataset contains Blender renderings of 30 viewpoints for each of the 7,953 3D objects, as well as camera intrinsics and extrinsic for each rendering. | ABO: Dataset and Benchmarks for Real-World 3D Object Understanding | Website | Material Prediction; Multi-View Retrieval; 3D Objects understanding; 3D Shape Reconstruction; |
NTIRE 2021 HDR | 2021 | This dataset is composed of approximately 1500 training, 60 validation, and 201 testing examples. Each example in the dataset is in turn composed of three input LDR images, i.e. short, medium, and long exposures, and a related ground-truth HDR image aligned with the central medium frame. | NTIRE 2021 Challenge on High Dynamic Range Imaging: Dataset, Methods and Results | Papers with Code | Image Super Resolution |
YCB-Video | 2017 | a large-scale video dataset for 6D object pose estimation. provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. | [PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes](PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes) | Website | 6D Pose Estimation |
Playing for benchmarks | 2017 | more than 250K high-resolution video frames, all annotated with ground-truth data. | Playing for benchmarks | Website | Semantic Instance Segmentation; Object Detection and Tracking; Object-Level 3D Scene Layout; |
4D Light Field Dataset | 2016 | 24 synthetic, densely sampled 4D light fields with highly accurate disparity ground truth. | A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields | GitHub - Website | Depth Estimation of 4D light fields |
ICL-NUIM Dataset | 2014 | RGB-D with noise models, 2 scenes. This is for indoor environments. | A benchmark for rgb-d visual odometry, 3d reconstruction, and slam. | Website | RGB-D, Visual Odometry and SLAM algorithms. |
3D Objects datasets
Basic high-level computer vision problems, such as object detection or segmentation, fully enjoy the benefits of perfect labeling provided by synthetic data, and there is plenty of effort devoted to making synthetic data work for these problems. Since making synthetic data requires the development of 3D models, datasets usually also feature 3D-related labeling such as the depth map, labeled 3D parts of a shape, volumetric 3D data, and so on.
Conclusion
The development and utilization of synthetic datasets have been a game-changer in the field of computer vision. They not only offer a solution to the data scarcity problem but also ensure a level of accuracy and variability that’s hard to achieve with real-world data alone. As technology progresses, we can anticipate even more sophisticated and realistic datasets that will continue to push the boundaries of what’s possible in computer vision.