Training an Object Detection Model with AutoTrain
Object detection is an essential task in computer vision, enabling models to identify and classify objects within images. AutoTrain simplifies this process by allowing users to train a state-of-the-art object detection model with ease. In this blog post, we'll walk you through the steps to prepare your data, configure your training parameters, and use the command-line interface (CLI) and user interface (UI) for an effective object detection model both locally and on Hugging Face cloud.
Preparing Your Data
Before training your model, you need to organize your images and create a metadata file. Follow these guidelines:
Data preparation for UI
Create a Zip Archive: Gather your images and a
metadata.jsonl
file into a single zip file. Your file structure should look like this:Archive.zip ├── 0001.png ├── 0002.png ├── 0003.png ├── ... └── metadata.jsonl
Prepare the Metadata: The
metadata.jsonl
file contains information about each image, including the bounding boxes and categories of objects. Here's an example:{"file_name": "0001.png", "objects": {"bbox": [[302.0, 109.0, 73.0, 52.0]], "category": [0]}} {"file_name": "0002.png", "objects": {"bbox": [[810.0, 100.0, 57.0, 28.0]], "category": [1]}} {"file_name": "0003.png", "objects": {"bbox": [[160.0, 31.0, 248.0, 616.0], [741.0, 68.0, 202.0, 401.0]], "category": [2, 2]}}
Ensure the bounding boxes are in COCO format
[x, y, width, height]
.
Data preparation for CLI
Alternatively, you can organize your data in folders if not using the UI:
Create Training and Validation Folders: Organize your images and
metadata.jsonl
into separate folders for training and validation.training/ ├── 0001.png ├── 0002.png ├── 0003.png ├── ... └── metadata.jsonl validation/ ├── 0004.png ├── 0005.png ├── ... └── metadata.jsonl
Prepare the Metadata: Similar to the UI method, the
metadata.jsonl
file should contain bounding box and category information.
Image Requirements
- Format: All images must be in JPEG, JPG, or PNG format.
- Quantity: Include at least 5 images to provide sufficient examples for learning.
- Exclusivity: The zip file should only contain images and the
metadata.jsonl
file. No additional files or nested folders should be included.
When the train.zip
is decompressed, it should create no folders, only images and metadata.jsonl
.
NOTE: you can also use a dataset from the Hugging Face Hub. This will be discussed further in this blogpost.
Configuring Training Parameters
AutoTrain offers various parameters to customize your training process. Here are the key parameters you can configure:
Basic Parameters
- --image-square-size: Resize input images to a square shape with the specified size (default is 600).
- --batch-size: Set the training batch size.
- --seed: Random seed for reproducibility.
- --epochs: Number of training epochs.
- --gradient_accumulation: Number of gradient accumulation steps.
- --disable_gradient_checkpointing: Disable gradient checkpointing.
- --lr: Learning rate.
- --log: Experiment tracking options (
none
,wandb
,tensorboard
).
Advanced Parameters
- --image-column: Specify the image column to use.
- --target-column: Specify the target column to use.
- --warmup-ratio: Proportion of training for a linear warmup (default is 0.1).
- --optimizer: Choose the optimizer algorithm (
adamw_torch
by default). - --scheduler: Select the learning rate scheduler (
linear
by default,cosine
is another option). - --weight-decay: Set the weight decay rate (default is 0.0).
- --max-grad-norm: Maximum norm of the gradients for gradient clipping (default is 1.0).
- --logging-steps: Determine the frequency of logging training progress (default is -1 for automatic determination).
- --evaluation-strategy: Specify the evaluation frequency (
no
,steps
,epoch
). - --save-total-limit: Limit the number of model checkpoints to save.
- --auto-find-batch-size: Automatically determine the batch size based on hardware capabilities.
- --mixed-precision: Choose precision mode (
fp16
,bf16
, or None).
Using the CLI for Training
To train your object detection model using the CLI, you can create a configuration file and run the autotrain
command. Below is an example configuration file for training on the CPPE-5 dataset from Hugging Face Hub.
Sample Configuration File
task: object_detection
base_model: facebook/detr-resnet-50
project_name: autotrain-obj-det-cppe5-2
log: tensorboard
backend: local
data:
path: cppe-5
train_split: train
valid_split: test
column_mapping:
image_column: image
objects_column: objects
params:
image_square_size: 600
epochs: 100
batch_size: 8
lr: 5e-5
weight_decay: 1e-4
optimizer: adamw_torch
scheduler: linear
gradient_accumulation: 1
mixed_precision: fp16
early_stopping_patience: 50
early_stopping_threshold: 0.001
hub:
username: ${HF_USERNAME}
token: ${HF_TOKEN}
push_to_hub: true
Running the Training
To start training, use the following command:
$ export HF_USERNAME=your_hugging_face_username
$ export HF_TOKEN=your_hugging_face_write_token
$ autotrain --config configfile.yml
This command will use the configuration specified in configfile.yml
to train your object detection model.
Note: you only need to export your username and token in case you have set push_to_hub
to true
.
In some cases of Hugging Face Datasets, the dataset might contain a config, in those cases, you can use dataset_config:split_name
for train_split and valid_split. For example, this dataset has configs: full
and mini
:
For this, the configfile changes will be:
data:
path: keremberke/license-plate-object-detection
train_split: full:train
valid_split: full:validation
column_mapping:
image_column: image
objects_column: objects
If your dataset is stored locally, you need to update the following in config yaml:
data:
path: /path/to/data/folder/
train_split: train # this folder contains images and metadata.jsonl
valid_split: val # this folder contains images and metadata.jsonl, optional, can be set to null
column_mapping:
image_column: image
objects_column: objects
Using the UI for training
Locally, you can start AutoTrain UI by running:
$ pip install -U autotrain-advanced
$ autotrain app --host 127.0.0.1 --port 8000
The app will start at http://127.0.0.1:8000
The data format for uploading in the UI is the same as described above for zip files.
In case you don't have proper hardware, you can also start the UI on Hugging Face spaces by clicking here. Read more in the docs.
When using a dataset from hub, you must map the columns correctly. The column mapping should remain as it is when using a local dataset (folder or zip).
Conclusion
AutoTrain simplifies the complex task of training object detection models, enabling you to focus on fine-tuning your model for optimal performance. By following these guidelines and utilizing the available parameters, you can create an effective object detection model tailored to your specific needs. Whether using the UI or CLI, AutoTrain provides a streamlined process for building powerful object detection models.
P.S: All models trained using AutoTrain are ready for deployment using API Inference and Inference Endpoints.
In case of any issues or feature requests, checkout the GitHub Repository.
Happy Training! :)