Object Detection
Object detection is a form of supervised learning where a model is trained to identify and categorize objects within images. AutoTrain simplifies the process, enabling you to train a state-of-the-art object detection model by simply uploading labeled example images.
Preparing your data
To ensure your object detection model trains effectively, follow these guidelines for preparing your data:
Organizing Images
Prepare a zip file containing your images and metadata.jsonl.
Archive.zip
βββ 0001.png
βββ 0002.png
βββ 0003.png
βββ .
βββ .
βββ .
βββ metadata.jsonl
Example for metadata.jsonl
:
{"file_name": "0001.png", "objects": {"bbox": [[302.0, 109.0, 73.0, 52.0]], "category": [0]}}
{"file_name": "0002.png", "objects": {"bbox": [[810.0, 100.0, 57.0, 28.0]], "category": [1]}}
{"file_name": "0003.png", "objects": {"bbox": [[160.0, 31.0, 248.0, 616.0], [741.0, 68.0, 202.0, 401.0]], "category": [2, 2]}}
Please note that bboxes need to be in COCO format [x, y, width, height]
.
Image Requirements
Format: Ensure all images are in JPEG, JPG, or PNG format.
Quantity: Include at least 5 images to provide the model with sufficient examples for learning.
Exclusivity: The zip file should exclusively contain images and metadata.jsonl. No additional files or nested folders should be included.
Some points to keep in mind:
- The images must be jpeg, jpg or png.
- There should be at least 5 images per split.
- There must not be any other files in the zip file.
- There must not be any other folders inside the zip folder.
When train.zip is decompressed, it creates no folders: only images and metadata.jsonl.
Parameters
class autotrain.trainers.object_detection.params.ObjectDetectionParams
< source >( data_path: str = None model: str = 'google/vit-base-patch16-224' username: Optional = None lr: float = 5e-05 epochs: int = 3 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: Optional = None logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: Optional = None save_total_limit: int = 1 token: Optional = None push_to_hub: bool = False eval_strategy: str = 'epoch' image_column: str = 'image' objects_column: str = 'objects' log: str = 'none' image_square_size: Optional = 600 early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )
Parameters
- data_path (str) — Path to the dataset.
- model (str) — Name of the model to be used. Default is “google/vit-base-patch16-224”.
- username (Optional[str]) — Hugging Face Username.
- lr (float) — Learning rate. Default is 5e-5.
- epochs (int) — Number of training epochs. Default is 3.
- batch_size (int) — Training batch size. Default is 8.
- warmup_ratio (float) — Warmup proportion. Default is 0.1.
- gradient_accumulation (int) — Gradient accumulation steps. Default is 1.
- optimizer (str) — Optimizer to be used. Default is “adamw_torch”.
- scheduler (str) — Scheduler to be used. Default is “linear”.
- weight_decay (float) — Weight decay. Default is 0.0.
- max_grad_norm (float) — Max gradient norm. Default is 1.0.
- seed (int) — Random seed. Default is 42.
- train_split (str) — Name of the training data split. Default is “train”.
- valid_split (Optional[str]) — Name of the validation data split.
- logging_steps (int) — Number of steps between logging. Default is -1.
- project_name (str) — Name of the project for output directory. Default is “project-name”.
- auto_find_batch_size (bool) — Whether to automatically find batch size. Default is False.
- mixed_precision (Optional[str]) — Mixed precision type (fp16, bf16, or None).
- save_total_limit (int) — Total number of checkpoints to save. Default is 1.
- token (Optional[str]) — Hub Token for authentication.
- push_to_hub (bool) — Whether to push the model to the Hugging Face Hub. Default is False.
- eval_strategy (str) — Evaluation strategy. Default is “epoch”.
- image_column (str) — Name of the image column in the dataset. Default is “image”.
- objects_column (str) — Name of the target column in the dataset. Default is “objects”.
- log (str) — Logging method for experiment tracking. Default is “none”.
- image_square_size (Optional[int]) — Longest size to which the image will be resized, then padded to square. Default is 600.
- early_stopping_patience (int) — Number of epochs with no improvement after which training will be stopped. Default is 5.
- early_stopping_threshold (float) — Minimum change to qualify as an improvement. Default is 0.01.
ObjectDetectionParams is a configuration class for object detection training parameters.