INSTALLATION

To get started with the fine-tuned DETR model from Hugging Face, please follow the steps below:

First, make sure you have an account on Huggingface communication.
To do it you can follow this Huggingface-Login link.
Verify that you have a Huggingface user token. If not, please create a new token from your Huggingface account. To do it you can follow this tutorial User access tokens.
Create the directory where you want to clone the finetuned_detr repository.
Open the terminal and navigate to the directory you just created or just open the terminal from this directory.
Run the following command to install Git Large File Storage (LFS) for handling large files:
```
pip3 install --upgrade huggingface_hub
```
Run the following command and enter your token to login to your Huggingface account
```
huggingface-cli login
```
Run the following command to install Git Large File Storage (LFS) for handling large files:
```
git lfs install
```
Clone the Hugging Face repository by running the following command:
```
git clone https://huggingface.co/coralavital/finetuned_detr
```
You may be required to log into your account again.
Open your workspace from the finetuned_detr directory.
In the workspace, open the terminal, and execute the following commands:
- Create a new Python virtual environment:
```
python3.9 -m venv venv
```
- Activate the virtual environment: On macOS/Linux:
```
source venv/bin/activate
```
  On Windows:
```
venv\Scripts\activate.bat
```
- Install necessary dependencies:
```
pip3 install Cython numpy
pip3 install -r requirements.txt
```
After completing the installation steps, you must choose the venv kernel that was created as vs code Kernel.
After selecting your correct environment, you can run the Jupyter notebook named "finetune_detr.ipynb" in your workspace.

ABSTRACT

FridgeIT dataset

This is dataset of products which we collected manually by iphone 12 pro max camera. The dataset contains a list of 5 products: Butter, Cottage, Cream, Milk and Mustard.

Finetune DETR

The goal of this notebook is to fine-tune Facebook's DETR (DEtection TRansformer).

From left to right: results obtained with pre-trained DETR, and after fine-tuning on the `fridgeIT` dataset.

Data

DETR will be fine-tuned on a tiny dataset: the fridgeIT dataset. We refer to it as the custom dataset. There are 2094 images in the training set, and 526 images in the validation set. We expect the directory structure to be the following:

path/to/coco/
├ annotations/  # JSON annotations
│  ├ annotations/custom_train.json
│  └ annotations/custom_val.json
├ train2017/    # training images
└ val2017/      # validation images

Metrics

Typical metrics to monitor, partially shown in [this notebook][metrics-notebook], include:

the Average Precision (AP), which is the primary challenge metric for the COCO dataset,
losses (total loss, classification loss, l1 bbox distance loss, GIoU loss),
errors (cardinality error, class error).

As mentioned in the paper, there are 3 components to the matching cost and to the total loss:

classification loss,

def loss_labels(self, outputs, targets, indices, num_boxes, log=True):
    """Classification loss (NLL)
    targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
    """
    [...]
    loss_ce = F.cross_entropy(src_logits.transpose(1, 2), target_classes, self.empty_weight)
    losses = {'loss_ce': loss_ce}

l1 bounding box distance loss,

def loss_boxes(self, outputs, targets, indices, num_boxes):
    """Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss
       targets dicts must contain the key "boxes" containing a tensor of dim [nb_target_boxes, 4]
       The target boxes are expected in format (center_x, center_y, w, h),normalized by the image
       size.
    """
    [...]
    loss_bbox = F.l1_loss(src_boxes, target_boxes, reduction='none')
    losses['loss_bbox'] = loss_bbox.sum() / num_boxes

Generalized Intersection over Union (GIoU) loss, which is scale-invariant.

    loss_giou = 1 - torch.diag(box_ops.generalized_box_iou(
        box_ops.box_cxcywh_to_xyxy(src_boxes),
        box_ops.box_cxcywh_to_xyxy(target_boxes)))
    losses['loss_giou'] = loss_giou.sum() / num_boxes

Moreover, there are two errors:

cardinality error,

def loss_cardinality(self, outputs, targets, indices, num_boxes):
    """ Compute the cardinality error, ie the absolute error in the number of predicted non-empty
    boxes. This is not really a loss, it is intended for logging purposes only. It doesn't
    propagate gradients
    """
    [...]
    # Count the number of predictions that are NOT "no-object" (which is the last class)
    card_pred = (pred_logits.argmax(-1) != pred_logits.shape[-1] - 1).sum(1)
    card_err = F.l1_loss(card_pred.float(), tgt_lengths.float())
    losses = {'cardinality_error': card_err}

class error,

    # TODO this should probably be a separate loss, not hacked in this one here
    losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]

where accuracy is:

def accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""