Load image data
Image datasets are loaded from the image
column, which contains a PIL object.
To work with image datasets, you need to have the vision
dependency installed. Check out the installation guide to learn how to install it.
When you load an image dataset and call the image
column, the Image feature automatically decodes the PIL object into an image:
>>> from datasets import load_dataset, Image
>>> dataset = load_dataset("beans", split="train")
>>> dataset[0]["image"]
Index into an image dataset using the row index first and then the image
column - dataset[0]["image"]
- to avoid decoding and resampling all the image objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
For a guide on how to load any type of dataset, take a look at the general loading guide.
Local files
You can load a dataset from the image path. Use the cast_column() function to accept a column of image file paths, and decode it into a PIL image with the Image feature:
>>> from datasets import load_dataset, Image
>>> dataset = Dataset.from_dict({"image": ["path/to/image_1", "path/to/image_2", ..., "path/to/image_n"]}).cast_column("image", Image())
>>> dataset[0]["image"]
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1200x215 at 0x15E6D7160>]
If you only want to load the underlying path to the image dataset without decoding the image object, set decode=False
in the Image feature:
>>> dataset = load_dataset("beans", split="train").cast_column("image", Image(decode=False))
>>> dataset[0]["image"]
{'bytes': None,
'path': '/root/.cache/huggingface/datasets/downloads/extracted/b0a21163f78769a2cf11f58dfc767fb458fc7cea5c05dccc0144a2c0f0bc1292/train/bean_rust/bean_rust_train.29.jpg'}
ImageFolder
You can also load a dataset with a ImageFolder
dataset builder. It does not require writing a custom dataloader, making it useful for quickly loading a dataset for certain vision tasks. Your image dataset structure should look like this:
folder/train/dog/golden_retriever.png
folder/train/dog/german_shepherd.png
folder/train/dog/chihuahua.png
folder/train/cat/maine_coon.png
folder/train/cat/bengal.png
folder/train/cat/birman.png
Load your dataset by specifying imagefolder
and the directory of your dataset in data_dir
:
>>> from datasets import load_dataset
>>> dataset = load_dataset("imagefolder", data_dir="/path/to/folder")
>>> dataset["train"][0]
{"image": <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1200x215 at 0x15E6D7160>, "label": 0}
>>> dataset["train"][-1]
{"image": <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1200x215 at 0x15E8DAD30>, "label": 1}
Load remote datasets from their URLs with the data_files
parameter:
>>> dataset = load_dataset("imagefolder", data_files="https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip", split="train")
ImageFolder
creates a label
column, and the label name is based on the directory name. To ignore the label
column, set drop_labels=False
as defined in ImageFolderConfig.
ImageFolder with metadata
Metadata associated with your dataset can also be loaded, extending the utility of ImageFolder
to additional vision tasks like image captioning and object detection. Make sure your dataset has a metadata.jsonl
file:
folder/train/metadata.jsonl
folder/train/0001.png
folder/train/0002.png
folder/train/0003.png
Your metadata.jsonl
file must have a file_name
column which links image files with their metadata:
{"file_name": "0001.png", "additional_feature": "This is a first value of a text feature you added to your images"}
{"file_name": "0002.png", "additional_feature": "This is a second value of a text feature you added to your images"}
{"file_name": "0003.png", "additional_feature": "This is a third value of a text feature you added to your images"}
If metadata files are present, the inferred labels based on the directory name are dropped by default. To include those labels, set drop_labels=False
in load_dataset
.
Image captioning
Image captioning datasets have text describing an image. An example metadata.jsonl
may look like:
{"file_name": "0001.png", "text": "This is a golden retriever playing with a ball"}
{"file_name": "0002.png", "text": "A german shepherd"}
{"file_name": "0003.png", "text": "One chihuahua"}
Load the dataset with ImageFolder
, and it will create a text
column for the image captions:
>>> dataset = load_dataset("imagefolder", data_dir="/path/to/folder", split="train")
>>> dataset[0]["text"]
"This is a golden retriever playing with a ball"
Object detection
Object detection datasets have bounding boxes and categories identifying objects in an image. An example metadata.jsonl
may look like:
{"file_name": "0001.png", "objects": {"bbox": [[302.0, 109.0, 73.0, 52.0]], "categories": [0]}}
{"file_name": "0002.png", "objects": {"bbox": [[810.0, 100.0, 57.0, 28.0]], "categories": [1]}}
{"file_name": "0003.png", "objects": {"bbox": [[160.0, 31.0, 248.0, 616.0], [741.0, 68.0, 202.0, 401.0]], "categories": [2, 2]}}
Load the dataset with ImageFolder
, and it will create a objects
column with the bounding boxes and the categories:
>>> dataset = load_dataset("imagefolder", data_dir="/path/to/folder", split="train")
>>> dataset[0]["objects"]
{"bbox": [[302.0, 109.0, 73.0, 52.0]], "categories": [0]}