Synthetic Datasets

Introduction

Welcome to the fascinating world of synthetic datasets in computer vision! As we’ve transitioned from classical unsupervised methods to advanced deep learning techniques, the demand for extensive and diverse datasets has skyrocketed. Synthetic datasets have emerged as a pivotal resource in training state-of-the-art models, providing an abundance of data that’s often impractical or impossible to collect in the real world. In this section, we’ll explore some of the most influential synthetic datasets, their applications, and how they’re shaping the future of computer vision.

Low-Level Computer Vision Problems

Optical Flow and Motion Analysis

Optical flow and motion analysis are critical in understanding image dynamics. Here are some datasets that have significantly contributed to advancements in this area:

Dataset Name	Year	Description	Paper	Additional Links
Middlebury	2021 (latest release)	The Middlebury Stereo dataset consists of high-resolution stereo sequences with complex geometry and pixel-accurate ground-truth disparity data. The ground-truth disparities are acquired using a technique that employs structured lighting and does not require the calibration of the light projectors.	A database and evaluation method for Optical Flow (Cited by 3192 at the time of writing)	Papers with Code - Website
Playing for Benchmarks	2017	more than 250K high-resolution video frames, all annotated with ground-truth data for high level tasks but also for low-level tasks like optical flow estimation and visual odometry.	Playing for benchmarks	Website
MPI-Sintel	2012	A synthetic dataset for optical flow. The main characteristic feature of MPI-Sintel is that it contains the same scenes with different render settings, varying quality and complexity; this approach can provide a deeper understanding of where different optical flow algorithms break down. (paper quote)	A Naturalistic Open Source Movie for Optical Flow Evaluation (551 citations at time of writing)	Website

Stereo Image Matching

Stereo image matching involves identifying corresponding elements in different images of the same scene. The following datasets have been instrumental in this field:

Name	Year	Description	Paper	Additional Links
Flying Chairs	2015	22k frame pairs with ground truth flow	Learning optical flow with convolutional networks.
Flying Chairs 3D	2015	22k stereo frames	A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation.
Driving	2015	4392 stereo frames	A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation.
Monkaa	2015	8591 stereo frames	A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation.
Middlebury 2014	2014	33 high-resolution stereo datasets	High-resolution stereo datasets with subpixel-accurate ground truth
Tsukuba Stereo	2012	This dataset includes 1800 stereo pairs accompanied by ground truth disparity maps, occlusion maps, and discontinuity maps.	Towards a simulation-driven stereo vision system	Project

High-Level Computer Vision Problems

Semantic Segmentation for Autonomous Driving

Semantic segmentation is vital for autonomous vehicles to interpret and navigate their surroundings safely. These datasets provide rich, annotated data for this purpose:

Name	Year	Description	Paper	Additional Links
Virtual KITTI 2	2020	Virtual Worlds as Proxy for Multi-Object Tracking Analysis	Virtual KITTI 2	Website
ApolloScape	2019	Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labeling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labeling, lane-mark labeling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities, and daytimes.	The ApolloScape Open Dataset for Autonomous Driving and its Application	Website
Driving in the Matrix	2017	The core idea behind “Driving in the Matrix” is to use photo-realistic computer-generated images from a simulation engine to produce annotated data quickly.	Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?	GitHub
CARLA	2017	CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation).	CARLA: An Open Urban Driving Simulator	Website
Synthia	2016	A large collection of synthetic images for semantic segmentation of urban scenes. SYNTHIA consists of a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes: misc, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist, lane-marking.	The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes	Website
GTA5	2016	The GTA5 dataset contains 24966 synthetic images with pixel-level semantic annotation. The images have been rendered using the open-world video game Grand Theft Auto 5 and are all from the car perspective in the streets of American-style virtual cities. 19 semantic classes are compatible with the ones of the Cityscapes dataset.	Playing for Data: Ground Truth from Computer Games	BitBucket
ProcSy		A synthetic dataset for semantic segmentation, modeled on a real-world urban environment and features a range of variable influence factors, such as weather and lighting.	ProcSy: Procedural Synthetic Dataset Generation Towards Influence Factor Studies Of Semantic Segmentation Networks	Website

Indoor Simulation and Navigation

Navigating indoor environments can be challenging due to their complexity. These datasets aid in developing systems capable of indoor simulation and navigation:

Name	Year	Description	Paper	Additional Links
Habitat	2023	An Embodied AI simulation platform for studying collaborative human-robot interaction tasks in home environments.	HABITAT 3.0: A CO-HABITAT FOR HUMANS, AVATARS AND ROBOTS	Website
Minos	2017	Multimodal Indoor Simulator	MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments	GitHub
House3D	2017 (archived in 2021)	A Rich and Realistic 3D Environment	Building generalisable agents with a realistic and rich 3D environment	GitHub

Human Action Recognition and Simulation

Recognizing and simulating human actions is a complex task that these datasets help to address:

Name	Year	Description	Paper	Additional Links
PHAV	2017	Synthetic dataset of procedurally generated human action recognition videos.	Procedural Generation of Videos to Train Deep Action Recognition Networks	Website
Surreal	2017	(change description - this is for human depth estimation and human part segmentation) Large-scale dataset with synthetically generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth poses, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images.	Learning from Synthetic Humans	GitHub - Website

Face Recognition

Face recognition technology has numerous applications, from security to user identification. Here’s a look at datasets that drive innovations in this field:

Name	Year	Description	Paper	Additional Links
FaceSynthetics	2021	The Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.	Fake It Till You Make It: Face Analysis in the Wild Using Synthetic Data Alone	Website - GitHub
FFHQ	2018	consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains considerable variation in terms of age, ethnicity and image background.	A Style-Based Generator Architecture for Generative Adversarial Networks	GitHub

3D Shape Modeling from single images

Creating 3D models from single images is a challenging yet exciting area. These datasets are at the forefront of research in 3D shape modeling:

Name	Year	Description	Paper
Pix3D	2018	A large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, and viewpoint estimation.	Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

Diverse Applications

The following datasets are either tailored for niche applications or cover multiple ones:

Dataset Name	Release Year	Description	Paper	External Links	Applications
CIFAKE	2023	CIFAKE is a dataset that contains 60,000 synthetically generated images and 60,000 real images (collected from CIFAR-10).	CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images	Kaggle	Real-Fake Images Classification
ABO	2022	ABO is a large-scale dataset designed for material prediction and multi-view retrieval experiments. The dataset contains Blender renderings of 30 viewpoints for each of the 7,953 3D objects, as well as camera intrinsics and extrinsic for each rendering.	ABO: Dataset and Benchmarks for Real-World 3D Object Understanding	Website	Material Prediction; Multi-View Retrieval; 3D Objects understanding; 3D Shape Reconstruction;
NTIRE 2021 HDR	2021	This dataset is composed of approximately 1500 training, 60 validation, and 201 testing examples. Each example in the dataset is in turn composed of three input LDR images, i.e. short, medium, and long exposures, and a related ground-truth HDR image aligned with the central medium frame.	NTIRE 2021 Challenge on High Dynamic Range Imaging: Dataset, Methods and Results	Papers with Code	Image Super Resolution
YCB-Video	2017	a large-scale video dataset for 6D object pose estimation. provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames.	[PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes](PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes)	Website	6D Pose Estimation
Playing for benchmarks	2017	more than 250K high-resolution video frames, all annotated with ground-truth data.	Playing for benchmarks	Website	Semantic Instance Segmentation; Object Detection and Tracking; Object-Level 3D Scene Layout;
4D Light Field Dataset	2016	24 synthetic, densely sampled 4D light fields with highly accurate disparity ground truth.	A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields	GitHub - Website	Depth Estimation of 4D light fields
ICL-NUIM Dataset	2014	RGB-D with noise models, 2 scenes. This is for indoor environments.	A benchmark for rgb-d visual odometry, 3d reconstruction, and slam.	Website	RGB-D, Visual Odometry and SLAM algorithms.

3D Objects datasets

Basic high-level computer vision problems, such as object detection or segmentation, fully enjoy the benefits of perfect labeling provided by synthetic data, and there is plenty of effort devoted to making synthetic data work for these problems. Since making synthetic data requires the development of 3D models, datasets usually also feature 3D-related labeling such as the depth map, labeled 3D parts of a shape, volumetric 3D data, and so on.

Dataset	Year	Description	Paper	Citations at the time of writing	Additional Links
ADORESet	2019	Hybrid dataset for object recognition testing	A hybrid image dataset toward bridging the gap between real and simulation environments for robotics.	13	GitHub
Falling Things	2018	61.5K images of YCB objects in virtual envs	Falling things: A synthetic dataset for 3d object detection and pose estimation.	171	Website
PartNet	2018	26671 models, 573535 annotated part instances	Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding.	552	Website
ShapeNetCore	2017	51K manually verified models from 55 categories	Large-scale 3d shape reconstruction and segmentation from shapenet core55.	71	Website
VANDAL	2017	4.1M depth images, >9K objects in 319 categories	A deep representation for depth images from synthetic data.	43	N/A
UnrealCV	2017	Plugin for UE4 to generate synthetic data	Unrealcv: Virtual worlds for computer vision.	95	N/A
SceneNet RGB-D	2017	5M RGB-D images from 16K 3D trajectories	Scenenet rgb-d: Can 5m synthetic images beat generic ImageNet pre-training on indoor segmentation?	309	Website
DepthSynth	2017	Framework for realistic simulation of depth sensors	Real-time realistic synthetic data generation from cad models for 2.5d recognition.	84	N/A
3DScan	2016	a large dataset of object scans	A large dataset of objects scan.	223	Website

Conclusion

The development and utilization of synthetic datasets have been a game-changer in the field of computer vision. They not only offer a solution to the data scarcity problem but also ensure a level of accuracy and variability that’s hard to achieve with real-world data alone. As technology progresses, we can anticipate even more sophisticated and realistic datasets that will continue to push the boundaries of what’s possible in computer vision.

References

< > Update on GitHub

Community Computer Vision Course