Spaces:
Running
A newer version of the Streamlit SDK is available:
1.53.1
Text detection
The sample training script was made to train text detection model with docTR.
Setup
First, you need to install doctr (with pip, for instance)
pip install -e . --upgrade
pip install -r references/requirements.txt
Usage
You can start your training in PyTorch:
python references/detection/train.py db_resnet50 --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5
Multi-GPU support
We now use the built-in torchrun launcher to spawn your DDP workers. torchrun will set all the necessary environment variables (LOCAL_RANK, RANK, etc.) for you. Arguments are the same than the ones from single GPU, except:
--backend: you can specify anotherbackendforDistributedDataParallelif the default one is not available on your operating system. Fastest one isncclaccording to PyTorch Documentation.
Key torchrun parameters
--nproc_per_node=<N>Spawn<N>processes on the local machine (typically equal to the number of GPUs you want to use).--nnodes=<M>(Optional) Total number of nodes in your job. Default is 1.--rdzv_backend,--rdzv_endpoint,--rdzv_id(Optional) Rendezvous settings for multi-node jobs. See the torchrun docs for details.
GPU selection
By default all visible GPUs will be used. To limit which GPUs participate, set the CUDA_VISIBLE_DEVICES environment variable before running torchrun. For example, to use only CUDA devices 0 and 2:
CUDA_VISIBLE_DEVICES=0,2 \
torchrun --nproc_per_node=2 references/detection/train.py \
db_resnet50 \
--train_path path/to/train \
--val_path path/to/val \
--epochs 5 \
--backend nccl
Data format
You need to provide both train_path and val_path arguments to start training.
Each path must lead to folder with 1 subfolder and 1 file:
βββ images
β βββ sample_img_01.png
β βββ sample_img_02.png
β βββ sample_img_03.png
β βββ ...
βββ labels.json
Each JSON file must be a dictionary, where the keys are the image file names and the value is a dictionary with 3 entries: img_dimensions (spatial shape of the image), img_hash (SHA256 of the image file), polygons (the set of 2D points forming the localization polygon).
The order of the points does not matter inside a polygon. Points are (x, y) absolutes coordinates.
labels.json
{
"sample_img_01.png" = {
'img_dimensions': (900, 600),
'img_hash': "theimagedumpmyhash",
'polygons': [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], ...]
},
"sample_img_02.png" = {
'img_dimensions': (900, 600),
'img_hash': "thisisahash",
'polygons': [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], ...]
}
...
}
If you want to train a model with multiple classes, you can use the following format where polygons is a dictionary where each key represents one class and has all the polygons representing that class.
labels.json
{
"sample_img_01.png": {
'img_dimensions': (900, 600),
'img_hash': "theimagedumpmyhash",
'polygons': {
"class_name_1": [[[x10, y10], [x20, y20], [x30, y30], [x40, y40]], ...],
"class_name_2": [[[x11, y11], [x21, y21], [x31, y31], [x41, y41]], ...]
}
},
"sample_img_02.png": {
'img_dimensions': (900, 600),
'img_hash': "thisisahash",
'polygons': {
"class_name_1": [[[x12, y12], [x22, y22], [x32, y32], [x42, y42]], ...],
"class_name_2": [[[x13, y13], [x23, y23], [x33, y33], [x43, y43]], ...]
}
},
...
}
Slack Logging with tqdm
To enable Slack logging using tqdm, you need to set the following environment variables:
TQDM_SLACK_TOKEN: the Slack Bot TokenTQDM_SLACK_CHANNEL: you can retrieve it usingRight Click on Channel > Copy > Copy link. You should get something likehttps://xxxxxx.slack.com/archives/yyyyyyyy. Keep only theyyyyyyyypart.
You can follow this page on how to create a Slack App.
Advanced options
Feel free to inspect the multiple script option to customize your training to your own needs!
python references/detection/train.py --help