Community Computer Vision Course documentation

Synthetic Datasets

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Synthetic Datasets

Introduction

Welcome to the fascinating world of synthetic datasets in computer vision! As we’ve transitioned from classical unsupervised methods to advanced deep learning techniques, the demand for extensive and diverse datasets has skyrocketed. Synthetic datasets have emerged as a pivotal resource in training state-of-the-art models, providing an abundance of data that’s often impractical or impossible to collect in the real world. In this section, we’ll explore some of the most influential synthetic datasets, their applications, and how they’re shaping the future of computer vision.

Low-Level Computer Vision Problems

Optical Flow and Motion Analysis

Optical flow and motion analysis are critical in understanding image dynamics. Here are some datasets that have significantly contributed to advancements in this area:

Dataset Name Year Description Paper Additional Links
Middlebury 2021 (latest release) The Middlebury Stereo dataset consists of high-resolution stereo sequences with complex geometry and pixel-accurate ground-truth disparity data. The ground-truth disparities are acquired using a technique that employs structured lighting and does not require the calibration of the light projectors. A database and evaluation method for Optical Flow (Cited by 3192 at the time of writing) Papers with Code - Website
Playing for Benchmarks 2017 more than 250K high-resolution video frames, all annotated with ground-truth data for high level tasks but also for low-level tasks like optical flow estimation and visual odometry. Playing for benchmarks Website
MPI-Sintel 2012 A synthetic dataset for optical flow. The main characteristic feature of MPI-Sintel is that it contains the same scenes with different render settings, varying quality and complexity; this approach can provide a deeper understanding of where different optical flow algorithms break down. (paper quote) A Naturalistic Open Source Movie for Optical Flow Evaluation (551 citations at time of writing) Website

Stereo Image Matching

Stereo image matching involves identifying corresponding elements in different images of the same scene. The following datasets have been instrumental in this field:

Name Year Description Paper Additional Links
Flying Chairs 2015 22k frame pairs with ground truth flow Learning optical flow with convolutional networks.
Flying Chairs 3D 2015 22k stereo frames A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation.
Driving 2015 4392 stereo frames A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation.
Monkaa 2015 8591 stereo frames A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation.
Middlebury 2014 2014 33 high-resolution stereo datasets High-resolution stereo datasets with subpixel-accurate ground truth
Tsukuba Stereo 2012 This dataset includes 1800 stereo pairs accompanied by ground truth disparity maps, occlusion maps, and discontinuity maps. Towards a simulation-driven stereo vision system Project

High-Level Computer Vision Problems

Semantic Segmentation for Autonomous Driving

Semantic segmentation is vital for autonomous vehicles to interpret and navigate their surroundings safely. These datasets provide rich, annotated data for this purpose:

Name Year Description Paper Additional Links
Virtual KITTI 2 2020 Virtual Worlds as Proxy for Multi-Object Tracking Analysis Virtual KITTI 2 Website
ApolloScape 2019 Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labeling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labeling, lane-mark labeling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities, and daytimes The ApolloScape Open Dataset for Autonomous Driving and its Application Website
Driving in the Matrix 2017 The core idea behind “Driving in the Matrix” is to use photo-realistic computer-generated images from a simulation engine to produce annotated data quickly. Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks? GitHub GitHub stars
CARLA 2017 CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation). CARLA: An Open Urban Driving Simulator Website
Synthia 2016 A large collection of synthetic images for semantic segmentation of urban scenes. SYNTHIA consists of a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes: misc, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist, lane-marking. The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes Website
GTA5 2016 The GTA5 dataset contains 24966 synthetic images with pixel-level semantic annotation. The images have been rendered using the open-world video game Grand Theft Auto 5 and are all from the car perspective in the streets of American-style virtual cities. 19 semantic classes are compatible with the ones of the Cityscapes dataset. Playing for Data: Ground Truth from Computer Games BitBucket
ProcSy A synthetic dataset for semantic segmentation, modeled on a real-world urban environment and features a range of variable influence factors, such as weather and lighting. ProcSy: Procedural Synthetic Dataset Generation Towards Influence Factor Studies Of Semantic Segmentation Networks Website

Indoor Simulation and Navigation

Navigating indoor environments can be challenging due to their complexity. These datasets aid in developing systems capable of indoor simulation and navigation:

Name Year Description Paper Additional Links
Habitat 2023 An Embodied AI simulation platform for studying collaborative human-robot interaction tasks in home environments. HABITAT 3.0: A CO-HABITAT FOR HUMANS, AVATARS AND ROBOTS Website
Minos 2017 Multimodal Indoor Simulator MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments GitHub GitHub stars
House3D 2017 (archived in 2021) A Rich and Realistic 3D Environment Building generalisable agents with a realistic and rich 3D environment GitHub GitHub stars

Human Action Recognition and Simulation

Recognizing and simulating human actions is a complex task that these datasets help to address:

Name Year Description Paper Additional Links
PHAV 2017 Synthetic dataset of procedurally generated human action recognition videos. Procedural Generation of Videos to Train Deep Action Recognition Networks Website
Surreal 2017 (change description - this is for human depth estimation and human part segmentation) Large-scale dataset with synthetically generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth poses, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Learning from Synthetic Humans GitHub GitHub stars- Website

Face Recognition

Face recognition technology has numerous applications, from security to user identification. Here’s a look at datasets that drive innovations in this field:

Name Year Description Paper Additional Links
FaceSynthetics 2021 The Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels. Fake It Till You Make It: Face Analysis in the Wild Using Synthetic Data Alone Website - GitHub GitHub stars
FFHQ 2018 consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains considerable variation in terms of age, ethnicity and image background. A Style-Based Generator Architecture for Generative Adversarial Networks GitHub GitHub stars

3D Shape Modeling from single images

Creating 3D models from single images is a challenging yet exciting area. These datasets are at the forefront of research in 3D shape modeling:

Name Year Description Paper
Pix3D 2018 A large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, and viewpoint estimation. Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

Diverse Applications

The following datasets are either tailored for niche applications or cover multiple ones:

Dataset Name Release Year Description Paper External Links Applications
CIFAKE 2023 CIFAKE is a dataset that contains 60,000 synthetically generated images and 60,000 real images (collected from CIFAR-10). CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images Kaggle Real-Fake Images Classification
ABO 2022 ABO is a large-scale dataset designed for material prediction and multi-view retrieval experiments. The dataset contains Blender renderings of 30 viewpoints for each of the 7,953 3D objects, as well as camera intrinsics and extrinsic for each rendering. ABO: Dataset and Benchmarks for Real-World 3D Object Understanding Website Material Prediction; Multi-View Retrieval; 3D Objects understanding; 3D Shape Reconstruction;
NTIRE 2021 HDR 2021 This dataset is composed of approximately 1500 training, 60 validation, and 201 testing examples. Each example in the dataset is in turn composed of three input LDR images, i.e. short, medium, and long exposures, and a related ground-truth HDR image aligned with the central medium frame. NTIRE 2021 Challenge on High Dynamic Range Imaging: Dataset, Methods and Results Papers with Code Image Super Resolution
YCB-Video 2017 a large-scale video dataset for 6D object pose estimation. provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. [PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes](PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes) Website 6D Pose Estimation
Playing for benchmarks 2017 more than 250K high-resolution video frames, all annotated with ground-truth data. Playing for benchmarks Website Semantic Instance Segmentation; Object Detection and Tracking; Object-Level 3D Scene Layout;
4D Light Field Dataset 2016 24 synthetic, densely sampled 4D light fields with highly accurate disparity ground truth. A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields GitHub GitHub stars - Website Depth Estimation of 4D light fields
ICL-NUIM Dataset 2014 RGB-D with noise models, 2 scenes. This is for indoor environments. A benchmark for rgb-d visual odometry, 3d reconstruction, and slam. Website RGB-D, Visual Odometry and SLAM algorithms.

3D Objects datasets

Basic high-level computer vision problems, such as object detection or segmentation, fully enjoy the benefits of perfect labeling provided by synthetic data, and there is plenty of effort devoted to making synthetic data work for these problems. Since making synthetic data requires the development of 3D models, datasets usually also feature 3D-related labeling such as the depth map, labeled 3D parts of a shape, volumetric 3D data, and so on.

Dataset Year Description Paper Citations at the time of writing Additional Links
ADORESet 2019 Hybrid dataset for object recognition testing A hybrid image dataset toward bridging the gap between real and simulation environments for robotics. 13 GitHub GitHub stars
Falling Things 2018 61.5K images of YCB objects in virtual envs Falling things: A synthetic dataset for 3d object detection and pose estimation. 171 Website
PartNet 2018 26671 models, 573535 annotated part instances Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. 552 Website
ShapeNetCore 2017 51K manually verified models from 55 categories Large-scale 3d shape reconstruction and segmentation from shapenet core55. 71 Website
VANDAL 2017 4.1M depth images, >9K objects in 319 categories A deep representation for depth images from synthetic data. 43 N/A
UnrealCV 2017 Plugin for UE4 to generate synthetic data Unrealcv: Virtual worlds for computer vision. 95 N/A
SceneNet RGB-D 2017 5M RGB-D images from 16K 3D trajectories Scenenet rgb-d: Can 5m synthetic images beat generic ImageNet pre-training on indoor segmentation? 309 Website
DepthSynth 2017 Framework for realistic simulation of depth sensors Real-time realistic synthetic data generation from cad models for 2.5d recognition. 84 N/A
3DScan 2016 a large dataset of object scans A large dataset of objects scan. 223 Website

Conclusion

The development and utilization of synthetic datasets have been a game-changer in the field of computer vision. They not only offer a solution to the data scarcity problem but also ensure a level of accuracy and variability that’s hard to achieve with real-world data alone. As technology progresses, we can anticipate even more sophisticated and realistic datasets that will continue to push the boundaries of what’s possible in computer vision.

References

< > Update on GitHub