INSTALLATION
To get started with the fine-tuned DETR model from Hugging Face, please follow the steps below:
- First, make sure you have an account on Huggingface communication.
To do it you can follow this Huggingface-Login link. - Verify that you have a Huggingface user token. If not, please create a new token from your Huggingface account. To do it you can follow this tutorial User access tokens.
- Create the directory where you want to clone the finetuned_detr repository.
- Open the terminal and navigate to the directory you just created or just open the terminal from this directory.
- Run the following command to install Git Large File Storage (LFS) for handling large files:
pip3 install --upgrade huggingface_hub
- Run the following command and enter your token to login to your Huggingface account
huggingface-cli login
- Run the following command to install Git Large File Storage (LFS) for handling large files:
git lfs install
- Clone the Hugging Face repository by running the following command:
You may be required to log into your account again.git clone https://huggingface.co/coralavital/finetuned_detr
- Open your workspace from the
finetuned_detr
directory. - In the workspace, open the terminal, and execute the following commands:
- Create a new Python virtual environment:
python3.9 -m venv venv
- Activate the virtual environment:
On macOS/Linux:
On Windows:source venv/bin/activate
venv\Scripts\activate.bat
- Install necessary dependencies:
pip3 install Cython numpy pip3 install -r requirements.txt
- Create a new Python virtual environment:
- After completing the installation steps, you must choose the venv kernel that was created as vs code Kernel.
- After selecting your correct environment, you can run the Jupyter notebook named "finetune_detr.ipynb" in your workspace.
ABSTRACT
FridgeIT dataset
This is dataset of products which we collected manually by iphone 12 pro max camera. The dataset contains a list of 5 products: Butter, Cottage, Cream, Milk and Mustard.
Finetune DETR
The goal of this notebook is to fine-tune Facebook's DETR (DEtection TRansformer).
Data
DETR will be fine-tuned on a tiny dataset: the fridgeIT
dataset.
We refer to it as the custom
dataset.
There are 2094 images in the training set, and 526 images in the validation set.
We expect the directory structure to be the following:
path/to/coco/
β annotations/ # JSON annotations
β β annotations/custom_train.json
β β annotations/custom_val.json
β train2017/ # training images
β val2017/ # validation images
Metrics
Typical metrics to monitor, partially shown in [this notebook][metrics-notebook], include:
- the Average Precision (AP), which is the primary challenge metric for the COCO dataset,
- losses (total loss, classification loss, l1 bbox distance loss, GIoU loss),
- errors (cardinality error, class error).
As mentioned in the paper, there are 3 components to the matching cost and to the total loss:
- classification loss,
def loss_labels(self, outputs, targets, indices, num_boxes, log=True):
"""Classification loss (NLL)
targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
"""
[...]
loss_ce = F.cross_entropy(src_logits.transpose(1, 2), target_classes, self.empty_weight)
losses = {'loss_ce': loss_ce}
- l1 bounding box distance loss,
def loss_boxes(self, outputs, targets, indices, num_boxes):
"""Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss
targets dicts must contain the key "boxes" containing a tensor of dim [nb_target_boxes, 4]
The target boxes are expected in format (center_x, center_y, w, h),normalized by the image
size.
"""
[...]
loss_bbox = F.l1_loss(src_boxes, target_boxes, reduction='none')
losses['loss_bbox'] = loss_bbox.sum() / num_boxes
- Generalized Intersection over Union (GIoU) loss, which is scale-invariant.
loss_giou = 1 - torch.diag(box_ops.generalized_box_iou(
box_ops.box_cxcywh_to_xyxy(src_boxes),
box_ops.box_cxcywh_to_xyxy(target_boxes)))
losses['loss_giou'] = loss_giou.sum() / num_boxes
Moreover, there are two errors:
- cardinality error,
def loss_cardinality(self, outputs, targets, indices, num_boxes):
""" Compute the cardinality error, ie the absolute error in the number of predicted non-empty
boxes. This is not really a loss, it is intended for logging purposes only. It doesn't
propagate gradients
"""
[...]
# Count the number of predictions that are NOT "no-object" (which is the last class)
card_pred = (pred_logits.argmax(-1) != pred_logits.shape[-1] - 1).sum(1)
card_err = F.l1_loss(card_pred.float(), tgt_lengths.float())
losses = {'cardinality_error': card_err}
# TODO this should probably be a separate loss, not hacked in this one here
losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]
where accuracy
is:
def accuracy(output, target, topk=(1,)):
"""Computes the precision@k for the specified values of k"""