How to build an interactive HF Space to visualize an Image Dataset

Community Article Published December 18, 2023

Interactive visualization of CIFAR-10[2] with Spotlight; Source: created by the author.

The Hugging Face ecosystem provides a rich range of datasets, including unstructured data types like images, videos, and audio. These datasets are widely used for the training and validation of many models available inside and outside Hugging Face Hub.

Many datasets with unstructured data can be overwhelming due to their extensive size, often containing numerous images that are impossible to review individually. Using foundation models to create embeddings brings structure to this data. By employing dimension reduction techniques like t-SNE or UMAP, you can generate similarity maps, making it easier to navigate through the data.

This article offers a tutorial on creating a Hugging Face space with an interactive visualization of an image dataset using Renumics Spotlight. The visualization includes a similarity map, filters, and statistics to navigate the data along with the ability to review each image in detail.

1 Load the dataset

First install the required dependencies:

!pip install renumics-spotlight datasets

Now you can load the image dataset for which you wish to create the visualization. As an example, here CIFAR-10 [1] is used. The CIFAR-10 dataset is a benchmark dataset in computer vision for image classification. It consists of 10 different classes. The dataset contains 60,000 small color images with dimensions of 32x32 pixels. For our analysis, we will focus on the 10,000 test images. You can choose your own dataset or any image classification datasets from Hugging Face here.

    import datasets
    # load dataset containing raw data (images and labels)
    ds = datasets.load_dataset("cifar10", split="test")

2 Create embeddings for the dataset

Embeddings created using foundation models bring structure to unstructured image data. They offer semantic information for tasks like data exploration, generating insights, and detecting outliers. By converting images into a lower-dimensional space, these embeddings allow the exploration of similarities in the data with the creation of similarity maps by techniques like t-SNE or UMAP:

UMAP of CIFAR-10 with selected clusters of similar images; Source: created by the author.

We recommend storing your embeddings directly in a second Hugging Face dataset, separate from the first original image dataset. You can use the transformers library to compute the embeddings using e.g., the google/vit-base-patch16–224-in21k [2] model. Utilize the infer function

    # load model and define inference functions
    import torch
    import transformers
    
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model_name = "google/vit-base-patch16-224-in21k"
    processor = transformers.ViTImageProcessor.from_pretrained(model_name)
    cls_model = transformers.ViTForImageClassification.from_pretrained(model_name).to(
        device
    )
    fe_model = transformers.ViTModel.from_pretrained(model_name).to(device)
    
    
    def infer(batch):
        images = [image.convert("RGB") for image in batch]
        inputs = processor(images=images, return_tensors="pt").to(device)
        with torch.no_grad():
            outputs = cls_model(**inputs)
            probs = torch.nn.functional.softmax(outputs.logits, dim=-1).cpu().numpy()
            embeddings = fe_model(**inputs).last_hidden_state[:, 0].cpu().numpy()
        return {"embedding": embeddings}

to extract embeddings. The embeddings are stored in the new dataset ds_enrichments with a single column embedding:

    # enrich dataset with predictions and embeddings
    ds_enrichments = ds.map(infer, input_columns="img", batched=True, batch_size=32).remove_columns(['img','label'])

3 Try the visualization locally

Before publishing the embeddings we can review results in Spotlight:

    from renumics import spotlight
    ds_enriched = datasets.concatenate_datasets([ds, ds_enrichments], axis=1)
    spotlight.show(ds_enriched, dtype={'embedding':spotlight.Embedding})

This will open a new browser window:

In the visualization section, the top left displays a table showing all the fields present in the dataset. On the top right, you can observe the UMAP representation represents the embeddings generated from the foundation model. At the bottom the selected images are displayed.

4 Publish the embeddings on the Hugging Face Hub

When you are satisfied with the results, you can publish the embeddings as a new dataset on Hugging Face:

    from huggingface_hub import login
    login()
    from huggingface_hub import create_repo
    USERNAME = "YOUR_ACCOUNT"
    create_repo(f"{USERNAME}/cifar10-enrichments", repo_type="dataset")
    ds_enrichments.push_to_hub(f"{USERNAME}/cifar10-enrichments")

5 Create a Hugging Face Space

To showcase your dataset together with the embeddings on the Hugging Face Hub, you can use Hugging Face spaces to launch a Spotlight visualization for it. You can use the prepared example space for the MNIST Image dataset on the hub, duplicate it and specify your datasets in the HF_DATASET and HF_ENRICHMENT variables:

After a few minutes the space will be ready:

6 Summary

The article demonstrates how foundation models can structure large, unstructured image datasets like CIFAR-10 with embeddings. The use of Renumics Spotlight in a Hugging Face space allows an interactive visualization of image datasets. This includes creating similarity maps using dimension reduction techniques like t-SNE or UMAP, enabling easier analysis and navigation of the data.

Try this workflow on your own image dataset and explore the possibilities. After applying these techniques, feel free to share your experience and feedback with us.

References

[1] Alex Krizhevsky, Learning Multiple Layers of Features from Tiny Images (2009), University Toronto

[2] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (2020), arXiv

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote