---
title: Which cheese are we eating?
description: 'Did you ever wonder what kind of cheese you should buy? They all look the same. And then the embarrasement when you can just point and say: that one. Meet the cheese classifier.'
date: '2025-03-13'
categories:
- machine learning
- python
- computer vision
execute:
 message: false
 warning: false
editor_options:
 chunk_output_type: console
output-file: meet-the-cheese
---

## Let's start with the why
I love cheese. Sometimes it is quite difficult to distinguish the varieties. Think about the embarrasement when you are in front of a mountain of cheese and can only point with your finger.

Therefore, I decided to built a ML classifier to help me.

The special difficulty here is that cheeses all look quite similar. Take, for example, the swiss Gruyere and the French Comte.

They are twins.

## Let’s continue with with the data.

First, we need some data. Fast.ai provides an easy download module to download images from DuckDuckGo.

As an alternative, we could use a dataset, if we have one. Let’s start by downloading the files and then create a dataset.

### Getting data from DuckDuckGo

Let’s start by defining what we want to download. We want cheese. In particular, French cheese. 

In [None]:
cheeses = [
 "Camembert",
 "Roquefort",
 "Comté",
 "Époisses de Bourgogne",
 "Tomme de Savoie",
 "Bleu d’Auvergne",
 "Brie de Meaux",
 "Mimolette",
 "Munster",
 "Livarot",
 "Pont-l’Évêque",
 "Reblochon",
 "Chabichou du Poitou",
 "Valençay",
 "Pélardon",
 "Fourme d’Ambert",
 "Selles-sur-Cher",
 "Cantal",
 "Neufchâtel",
 "Banon",
 "Gruyere"
]


To have a larger variety of images we define some extra search terms.

In [None]:
search_terms = [
 "cheese close-up texture",
 "cheese macro shot",
 "cheese cut section"
]

As we work with Fast.ai , let's import the basic stuff.

In [None]:
from duckduckgo_search import DDGS
from fastcore.all import *
from fastai.vision.all import *
def search_images(keywords, max_images=20): return L(DDGS().images(keywords, max_results=max_images)).itemgot('image')
import time, json

And then define our download function:

In [None]:
from fastdownload import download_url
from pathlib import Path
import time

data_acquisition=False

def download():
 # Loop through all combinations of cheeses and search terms
 for cheese in cheeses:
 dest = Path("which_cheese") / cheese # Create subdirectory for each cheese
 dest.mkdir(exist_ok=True, parents=True)

 for term in search_terms:
 query = f"{cheese} {term}"
 download_images(dest, urls=search_images(f"{query} photo"))
 time.sleep(5)

 # Resize images after downloading
 resize_images(dest, max_size=400, dest=dest)

# Run download only if data acquisition is enabled
if data_acquisition:
 download()

We can verify the images now or later.

In [None]:
if data_acquisition:
 failed = verify_images(get_image_files(path))
 failed.map(Path.unlink)
 len(failed)
 failed

### Loading data from a Kaggle dataset
I created a dataset of these images to avoid having to download again when I start over.

Sadly to uncertain copyright issues of this data, my dataset needs to remain private. But you can easily create your own.

As I run most of my code locally, I have some code to get it from Kaggle

In [None]:
competition_name= None
dataset_name = 'cheese'

import os
from pathlib import Path

iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')
if competition_name:
 if iskaggle: 
 comp_path = Path('../input/'+ competition_name)
 else:
 comp_path = Path(competition_name)
 if not path.exists():
 import zipfile,kaggle
 kaggle.api.competition_download_cli(str(comp_path))
 zipfile.ZipFile(f'{comp_path}.zip').extractall(comp_path)


if dataset_name:
 if iskaggle:
 path = Path(f'../input/{dataset_name}')
 else:
 path = Path(dataset_name)
 if not path.exists():
 import zipfile, kaggle
 kaggle.api.dataset_download_cli(dataset_name, path='.')
 zipfile.ZipFile(f'{dataset_name}.zip').extractall(path) 


Now we have downloaded the data, we can start using it.