Spaces:
Running
Running
Mountchicken
commited on
Commit
•
9bf4bd7
1
Parent(s):
6163561
Upload 704 files
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- CITATION.cff +9 -0
- configs/backbone/oclip/README.md +41 -0
- configs/backbone/oclip/metafile.yml +13 -0
- configs/kie/_base_/datasets/wildreceipt-openset.py +26 -0
- configs/kie/_base_/datasets/wildreceipt.py +16 -0
- configs/kie/_base_/default_runtime.py +33 -0
- configs/kie/_base_/schedules/schedule_adam_60e.py +10 -0
- configs/kie/sdmgr/README.md +41 -0
- configs/kie/sdmgr/_base_sdmgr_novisual.py +35 -0
- configs/kie/sdmgr/_base_sdmgr_unet16.py +28 -0
- configs/kie/sdmgr/metafile.yml +52 -0
- configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py +71 -0
- configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py +28 -0
- configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py +29 -0
- configs/textdet/_base_/datasets/ctw1500.py +15 -0
- configs/textdet/_base_/datasets/icdar2015.py +15 -0
- configs/textdet/_base_/datasets/icdar2017.py +17 -0
- configs/textdet/_base_/datasets/synthtext.py +8 -0
- configs/textdet/_base_/datasets/totaltext.py +15 -0
- configs/textdet/_base_/datasets/toy_data.py +17 -0
- configs/textdet/_base_/default_runtime.py +41 -0
- configs/textdet/_base_/pretrain_runtime.py +14 -0
- configs/textdet/_base_/schedules/schedule_adam_600e.py +9 -0
- configs/textdet/_base_/schedules/schedule_sgd_100k.py +12 -0
- configs/textdet/_base_/schedules/schedule_sgd_1200e.py +11 -0
- configs/textdet/_base_/schedules/schedule_sgd_base.py +15 -0
- configs/textdet/dbnet/README.md +47 -0
- configs/textdet/dbnet/_base_dbnet_resnet18_fpnc.py +64 -0
- configs/textdet/dbnet/_base_dbnet_resnet50-dcnv2_fpnc.py +66 -0
- configs/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext.py +45 -0
- configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py +30 -0
- configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext.py +73 -0
- configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_100k_synthtext.py +30 -0
- configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py +33 -0
- configs/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015.py +20 -0
- configs/textdet/dbnet/dbnet_resnet50_1200e_icdar2015.py +24 -0
- configs/textdet/dbnet/metafile.yml +80 -0
- configs/textdet/dbnetpp/README.md +41 -0
- configs/textdet/dbnetpp/_base_dbnetpp_resnet50-dcnv2_fpnc.py +72 -0
- configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py +44 -0
- configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py +36 -0
- configs/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015.py +20 -0
- configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015.py +24 -0
- configs/textdet/dbnetpp/metafile.yml +56 -0
- configs/textdet/drrg/README.md +34 -0
- configs/textdet/drrg/_base_drrg_resnet50_fpn-unet.py +92 -0
- configs/textdet/drrg/drrg_resnet50-oclip_fpn-unet_1200e_ctw1500.py +17 -0
- configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py +30 -0
- configs/textdet/drrg/metafile.yml +28 -0
- configs/textdet/fcenet/README.md +46 -0
CITATION.cff
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
cff-version: 1.2.0
|
2 |
+
message: "If you use this software, please cite it as below."
|
3 |
+
title: "OpenMMLab Text Detection, Recognition and Understanding Toolbox"
|
4 |
+
authors:
|
5 |
+
- name: "MMOCR Contributors"
|
6 |
+
version: 0.3.0
|
7 |
+
date-released: 2020-08-15
|
8 |
+
repository-code: "https://github.com/open-mmlab/mmocr"
|
9 |
+
license: Apache-2.0
|
configs/backbone/oclip/README.md
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# oCLIP
|
2 |
+
|
3 |
+
> [Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880282.pdf)
|
4 |
+
|
5 |
+
<!-- [ALGORITHM] -->
|
6 |
+
|
7 |
+
## Abstract
|
8 |
+
|
9 |
+
Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-language tasks by jointly learning visual and textual representations, which intuitively helps in Optical Character Recognition (OCR) tasks due to the rich visual and textual information in scene text images. However, these methods cannot well cope with OCR tasks because of the difficulty in both instance-level text encoding and image-text pair acquisition (i.e. images and captured texts in them). This paper presents a weakly supervised pre-training method, oCLIP, which can acquire effective scene text representations by jointly learning and aligning visual and textual information. Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features, respectively, as well as a visual-textual decoder that models the interaction among textual and visual features for learning effective scene text representations. With the learning of textual features, the pre-trained model can attend texts in images well with character awareness. Besides, these designs enable the learning from weakly annotated texts (i.e. partial texts in images without text bounding boxes) which mitigates the data annotation constraint greatly. Experiments over the weakly annotated images in ICDAR2019-LSVT show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks, respectively. In addition, the proposed method outperforms existing pre-training techniques consistently across multiple public datasets (e.g., +3.2% and +1.3% for Total-Text and CTW1500).
|
10 |
+
|
11 |
+
<div align=center>
|
12 |
+
<img src="https://user-images.githubusercontent.com/24622904/199475057-aa688422-518d-4d7a-86fc-1be0cc1b5dc6.png"/>
|
13 |
+
</div>
|
14 |
+
|
15 |
+
## Models
|
16 |
+
|
17 |
+
| Backbone | Pre-train Data | Model |
|
18 |
+
| :-------: | :------------: | :-------------------------------------------------------------------------------: |
|
19 |
+
| ResNet-50 | SynthText | [Link](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) |
|
20 |
+
|
21 |
+
```{note}
|
22 |
+
The model is converted from the official [oCLIP](https://github.com/bytedance/oclip.git).
|
23 |
+
```
|
24 |
+
|
25 |
+
## Supported Text Detection Models
|
26 |
+
|
27 |
+
| | [DBNet](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#dbnet) | [DBNet++](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#dbnetpp) | [FCENet](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#fcenet) | [TextSnake](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#fcenet) | [PSENet](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#psenet) | [DRRG](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#drrg) | [Mask R-CNN](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#mask-r-cnn) |
|
28 |
+
| :-------: | :------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :--------------------------------------------------------------------------: | :-----------------------------------------------------------------------------: | :--------------------------------------------------------------------------: | :----------------------------------------------------------------------: | :----------------------------------------------------------------------------------: |
|
29 |
+
| ICDAR2015 | ✓ | ✓ | ✓ | | ✓ | | ✓ |
|
30 |
+
| CTW1500 | | | ✓ | ✓ | ✓ | ✓ | ✓ |
|
31 |
+
|
32 |
+
## Citation
|
33 |
+
|
34 |
+
```bibtex
|
35 |
+
@article{xue2022language,
|
36 |
+
title={Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting},
|
37 |
+
author={Xue, Chuhui and Zhang, Wenqing and Hao, Yu and Lu, Shijian and Torr, Philip and Bai, Song},
|
38 |
+
journal={Proceedings of the European Conference on Computer Vision (ECCV)},
|
39 |
+
year={2022}
|
40 |
+
}
|
41 |
+
```
|
configs/backbone/oclip/metafile.yml
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Collections:
|
2 |
+
- Name: oCLIP
|
3 |
+
Metadata:
|
4 |
+
Training Data: SynthText
|
5 |
+
Architecture:
|
6 |
+
- CLIPResNet
|
7 |
+
Paper:
|
8 |
+
URL: https://arxiv.org/abs/2203.03911
|
9 |
+
Title: 'Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting'
|
10 |
+
README: configs/backbone/oclip/README.md
|
11 |
+
|
12 |
+
Models:
|
13 |
+
Weights: https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth
|
configs/kie/_base_/datasets/wildreceipt-openset.py
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
wildreceipt_openset_data_root = 'data/wildreceipt/'
|
2 |
+
|
3 |
+
wildreceipt_openset_train = dict(
|
4 |
+
type='WildReceiptDataset',
|
5 |
+
data_root=wildreceipt_openset_data_root,
|
6 |
+
metainfo=dict(category=[
|
7 |
+
dict(id=0, name='bg'),
|
8 |
+
dict(id=1, name='key'),
|
9 |
+
dict(id=2, name='value'),
|
10 |
+
dict(id=3, name='other')
|
11 |
+
]),
|
12 |
+
ann_file='openset_train.txt',
|
13 |
+
pipeline=None)
|
14 |
+
|
15 |
+
wildreceipt_openset_test = dict(
|
16 |
+
type='WildReceiptDataset',
|
17 |
+
data_root=wildreceipt_openset_data_root,
|
18 |
+
metainfo=dict(category=[
|
19 |
+
dict(id=0, name='bg'),
|
20 |
+
dict(id=1, name='key'),
|
21 |
+
dict(id=2, name='value'),
|
22 |
+
dict(id=3, name='other')
|
23 |
+
]),
|
24 |
+
ann_file='openset_test.txt',
|
25 |
+
test_mode=True,
|
26 |
+
pipeline=None)
|
configs/kie/_base_/datasets/wildreceipt.py
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
wildreceipt_data_root = 'data/wildreceipt/'
|
2 |
+
|
3 |
+
wildreceipt_train = dict(
|
4 |
+
type='WildReceiptDataset',
|
5 |
+
data_root=wildreceipt_data_root,
|
6 |
+
metainfo=wildreceipt_data_root + 'class_list.txt',
|
7 |
+
ann_file='train.txt',
|
8 |
+
pipeline=None)
|
9 |
+
|
10 |
+
wildreceipt_test = dict(
|
11 |
+
type='WildReceiptDataset',
|
12 |
+
data_root=wildreceipt_data_root,
|
13 |
+
metainfo=wildreceipt_data_root + 'class_list.txt',
|
14 |
+
ann_file='test.txt',
|
15 |
+
test_mode=True,
|
16 |
+
pipeline=None)
|
configs/kie/_base_/default_runtime.py
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
default_scope = 'mmocr'
|
2 |
+
env_cfg = dict(
|
3 |
+
cudnn_benchmark=False,
|
4 |
+
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
|
5 |
+
dist_cfg=dict(backend='nccl'),
|
6 |
+
)
|
7 |
+
randomness = dict(seed=None)
|
8 |
+
|
9 |
+
default_hooks = dict(
|
10 |
+
timer=dict(type='IterTimerHook'),
|
11 |
+
logger=dict(type='LoggerHook', interval=100),
|
12 |
+
param_scheduler=dict(type='ParamSchedulerHook'),
|
13 |
+
checkpoint=dict(type='CheckpointHook', interval=1),
|
14 |
+
sampler_seed=dict(type='DistSamplerSeedHook'),
|
15 |
+
sync_buffer=dict(type='SyncBuffersHook'),
|
16 |
+
visualization=dict(
|
17 |
+
type='VisualizationHook',
|
18 |
+
interval=1,
|
19 |
+
enable=False,
|
20 |
+
show=False,
|
21 |
+
draw_gt=False,
|
22 |
+
draw_pred=False),
|
23 |
+
)
|
24 |
+
|
25 |
+
# Logging
|
26 |
+
log_level = 'INFO'
|
27 |
+
log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
|
28 |
+
|
29 |
+
load_from = None
|
30 |
+
resume = False
|
31 |
+
|
32 |
+
visualizer = dict(
|
33 |
+
type='KIELocalVisualizer', name='visualizer', is_openset=False)
|
configs/kie/_base_/schedules/schedule_adam_60e.py
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# optimizer
|
2 |
+
optim_wrapper = dict(
|
3 |
+
type='OptimWrapper', optimizer=dict(type='Adam', weight_decay=0.0001))
|
4 |
+
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=60, val_interval=1)
|
5 |
+
val_cfg = dict(type='ValLoop')
|
6 |
+
test_cfg = dict(type='TestLoop')
|
7 |
+
# learning rate
|
8 |
+
param_scheduler = [
|
9 |
+
dict(type='MultiStepLR', milestones=[40, 50], end=60),
|
10 |
+
]
|
configs/kie/sdmgr/README.md
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# SDMGR
|
2 |
+
|
3 |
+
> [Spatial Dual-Modality Graph Reasoning for Key Information Extraction](https://arxiv.org/abs/2103.14470)
|
4 |
+
|
5 |
+
<!-- [ALGORITHM] -->
|
6 |
+
|
7 |
+
## Abstract
|
8 |
+
|
9 |
+
Key information extraction from document images is of paramount importance in office automation. Conventional template matching based approaches fail to generalize well to document images of unseen templates, and are not robust against text recognition errors. In this paper, we propose an end-to-end Spatial Dual-Modality Graph Reasoning method (SDMG-R) to extract key information from unstructured document images. We model document images as dual-modality graphs, nodes of which encode both the visual and textual features of detected text regions, and edges of which represent the spatial relations between neighboring text regions. The key information extraction is solved by iteratively propagating messages along graph edges and reasoning the categories of graph nodes. In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild. It contains 25 key information categories, a total of about 69000 text boxes, and is about 2 times larger than the existing public datasets. Extensive experiments validate that all information including visual features, textual features and spatial relations can benefit key information extraction. It has been shown that SDMG-R can effectively extract key information from document images of unseen templates, and obtain new state-of-the-art results on the recent popular benchmark SROIE and our WildReceipt. Our code and dataset will be publicly released.
|
10 |
+
|
11 |
+
<div align=center>
|
12 |
+
<img src="https://user-images.githubusercontent.com/22607038/142580689-18edb4d7-f716-475c-b1c1-e2b934658cee.png"/>
|
13 |
+
</div>
|
14 |
+
|
15 |
+
## Results and models
|
16 |
+
|
17 |
+
### WildReceipt
|
18 |
+
|
19 |
+
| Method | Modality | Macro F1-Score | Download |
|
20 |
+
| :--------------------------------------------------------------------: | :--------------: | :------------: | :--------------------------------------------------------------------------------------------------: |
|
21 |
+
| [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.890 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/20220825_151648.log) |
|
22 |
+
| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.873 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/sdmgr_novisual_60e_wildreceipt_20220831_193317-827649d8.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/20220831_193317.log) |
|
23 |
+
|
24 |
+
### WildReceiptOpenset
|
25 |
+
|
26 |
+
| Method | Modality | Edge F1-Score | Node Macro F1-Score | Node Micro F1-Score | Download |
|
27 |
+
| :-------------------------------------------------------------------: | :------: | :-----------: | :-----------------: | :-----------------: | :----------------------------------------------------------------------: |
|
28 |
+
| [sdmgr_novisual_openset](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py) | Textual | 0.792 | 0.931 | 0.940 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/sdmgr_novisual_60e_wildreceipt-openset_20220831_200807-dedf15ec.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/20220831_200807.log) |
|
29 |
+
|
30 |
+
## Citation
|
31 |
+
|
32 |
+
```bibtex
|
33 |
+
@misc{sun2021spatial,
|
34 |
+
title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction},
|
35 |
+
author={Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang},
|
36 |
+
year={2021},
|
37 |
+
eprint={2103.14470},
|
38 |
+
archivePrefix={arXiv},
|
39 |
+
primaryClass={cs.CV}
|
40 |
+
}
|
41 |
+
```
|
configs/kie/sdmgr/_base_sdmgr_novisual.py
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
num_classes = 26
|
2 |
+
|
3 |
+
model = dict(
|
4 |
+
type='SDMGR',
|
5 |
+
kie_head=dict(
|
6 |
+
type='SDMGRHead',
|
7 |
+
visual_dim=16,
|
8 |
+
num_classes=num_classes,
|
9 |
+
module_loss=dict(type='SDMGRModuleLoss'),
|
10 |
+
postprocessor=dict(type='SDMGRPostProcessor')),
|
11 |
+
dictionary=dict(
|
12 |
+
type='Dictionary',
|
13 |
+
dict_file='{{ fileDirname }}/../../../dicts/sdmgr_dict.txt',
|
14 |
+
with_padding=True,
|
15 |
+
with_unknown=True,
|
16 |
+
unknown_token=None),
|
17 |
+
)
|
18 |
+
|
19 |
+
train_pipeline = [
|
20 |
+
dict(type='LoadKIEAnnotations'),
|
21 |
+
dict(type='Resize', scale=(1024, 512), keep_ratio=True),
|
22 |
+
dict(type='PackKIEInputs')
|
23 |
+
]
|
24 |
+
test_pipeline = [
|
25 |
+
dict(type='LoadKIEAnnotations'),
|
26 |
+
dict(type='Resize', scale=(1024, 512), keep_ratio=True),
|
27 |
+
dict(type='PackKIEInputs'),
|
28 |
+
]
|
29 |
+
|
30 |
+
val_evaluator = dict(
|
31 |
+
type='F1Metric',
|
32 |
+
mode='macro',
|
33 |
+
num_classes=num_classes,
|
34 |
+
ignored_classes=[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 25])
|
35 |
+
test_evaluator = val_evaluator
|
configs/kie/sdmgr/_base_sdmgr_unet16.py
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = '_base_sdmgr_novisual.py'
|
2 |
+
|
3 |
+
model = dict(
|
4 |
+
backbone=dict(type='UNet', base_channels=16),
|
5 |
+
roi_extractor=dict(
|
6 |
+
type='mmdet.SingleRoIExtractor',
|
7 |
+
roi_layer=dict(type='RoIAlign', output_size=7),
|
8 |
+
featmap_strides=[1]),
|
9 |
+
data_preprocessor=dict(
|
10 |
+
type='ImgDataPreprocessor',
|
11 |
+
mean=[123.675, 116.28, 103.53],
|
12 |
+
std=[58.395, 57.12, 57.375],
|
13 |
+
bgr_to_rgb=True,
|
14 |
+
pad_size_divisor=32),
|
15 |
+
)
|
16 |
+
|
17 |
+
train_pipeline = [
|
18 |
+
dict(type='LoadImageFromFile'),
|
19 |
+
dict(type='LoadKIEAnnotations'),
|
20 |
+
dict(type='Resize', scale=(1024, 512), keep_ratio=True),
|
21 |
+
dict(type='PackKIEInputs')
|
22 |
+
]
|
23 |
+
test_pipeline = [
|
24 |
+
dict(type='LoadImageFromFile'),
|
25 |
+
dict(type='LoadKIEAnnotations'),
|
26 |
+
dict(type='Resize', scale=(1024, 512), keep_ratio=True),
|
27 |
+
dict(type='PackKIEInputs', meta_keys=('img_path', )),
|
28 |
+
]
|
configs/kie/sdmgr/metafile.yml
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Collections:
|
2 |
+
- Name: SDMGR
|
3 |
+
Metadata:
|
4 |
+
Training Data: KIEDataset
|
5 |
+
Training Techniques:
|
6 |
+
- Adam
|
7 |
+
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
8 |
+
Architecture:
|
9 |
+
- UNet
|
10 |
+
- SDMGRHead
|
11 |
+
Paper:
|
12 |
+
URL: https://arxiv.org/abs/2103.14470.pdf
|
13 |
+
Title: 'Spatial Dual-Modality Graph Reasoning for Key Information Extraction'
|
14 |
+
README: configs/kie/sdmgr/README.md
|
15 |
+
|
16 |
+
Models:
|
17 |
+
- Name: sdmgr_unet16_60e_wildreceipt
|
18 |
+
Alias: SDMGR
|
19 |
+
In Collection: SDMGR
|
20 |
+
Config: configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py
|
21 |
+
Metadata:
|
22 |
+
Training Data: wildreceipt
|
23 |
+
Results:
|
24 |
+
- Task: Key Information Extraction
|
25 |
+
Dataset: wildreceipt
|
26 |
+
Metrics:
|
27 |
+
macro_f1: 0.890
|
28 |
+
Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth
|
29 |
+
- Name: sdmgr_novisual_60e_wildreceipt
|
30 |
+
In Collection: SDMGR
|
31 |
+
Config: configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py
|
32 |
+
Metadata:
|
33 |
+
Training Data: wildreceipt
|
34 |
+
Results:
|
35 |
+
- Task: Key Information Extraction
|
36 |
+
Dataset: wildreceipt
|
37 |
+
Metrics:
|
38 |
+
macro_f1: 0.873
|
39 |
+
Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/sdmgr_novisual_60e_wildreceipt_20220831_193317-827649d8.pth
|
40 |
+
- Name: sdmgr_novisual_60e_wildreceipt_openset
|
41 |
+
In Collection: SDMGR
|
42 |
+
Config: configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py
|
43 |
+
Metadata:
|
44 |
+
Training Data: wildreceipt-openset
|
45 |
+
Results:
|
46 |
+
- Task: Key Information Extraction
|
47 |
+
Dataset: wildreceipt
|
48 |
+
Metrics:
|
49 |
+
macro_f1: 0.931
|
50 |
+
micro_f1: 0.940
|
51 |
+
edge_micro_f1: 0.792
|
52 |
+
Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/sdmgr_novisual_60e_wildreceipt-openset_20220831_200807-dedf15ec.pth
|
configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py
ADDED
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'../_base_/default_runtime.py',
|
3 |
+
'../_base_/datasets/wildreceipt-openset.py',
|
4 |
+
'../_base_/schedules/schedule_adam_60e.py',
|
5 |
+
'_base_sdmgr_novisual.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
node_num_classes = 4 # 4 classes: bg, key, value and other
|
9 |
+
edge_num_classes = 2 # edge connectivity
|
10 |
+
key_node_idx = 1
|
11 |
+
value_node_idx = 2
|
12 |
+
|
13 |
+
model = dict(
|
14 |
+
type='SDMGR',
|
15 |
+
kie_head=dict(
|
16 |
+
num_classes=node_num_classes,
|
17 |
+
postprocessor=dict(
|
18 |
+
link_type='one-to-many',
|
19 |
+
key_node_idx=key_node_idx,
|
20 |
+
value_node_idx=value_node_idx)),
|
21 |
+
)
|
22 |
+
|
23 |
+
test_pipeline = [
|
24 |
+
dict(
|
25 |
+
type='LoadKIEAnnotations',
|
26 |
+
key_node_idx=key_node_idx,
|
27 |
+
value_node_idx=value_node_idx), # Keep key->value edges for evaluation
|
28 |
+
dict(type='Resize', scale=(1024, 512), keep_ratio=True),
|
29 |
+
dict(type='PackKIEInputs'),
|
30 |
+
]
|
31 |
+
|
32 |
+
wildreceipt_openset_train = _base_.wildreceipt_openset_train
|
33 |
+
wildreceipt_openset_train.pipeline = _base_.train_pipeline
|
34 |
+
wildreceipt_openset_test = _base_.wildreceipt_openset_test
|
35 |
+
wildreceipt_openset_test.pipeline = test_pipeline
|
36 |
+
|
37 |
+
train_dataloader = dict(
|
38 |
+
batch_size=4,
|
39 |
+
num_workers=1,
|
40 |
+
persistent_workers=True,
|
41 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
42 |
+
dataset=wildreceipt_openset_train)
|
43 |
+
val_dataloader = dict(
|
44 |
+
batch_size=1,
|
45 |
+
num_workers=1,
|
46 |
+
persistent_workers=True,
|
47 |
+
sampler=dict(type='DefaultSampler', shuffle=False),
|
48 |
+
dataset=wildreceipt_openset_test)
|
49 |
+
test_dataloader = val_dataloader
|
50 |
+
|
51 |
+
val_evaluator = [
|
52 |
+
dict(
|
53 |
+
type='F1Metric',
|
54 |
+
prefix='node',
|
55 |
+
key='labels',
|
56 |
+
mode=['micro', 'macro'],
|
57 |
+
num_classes=node_num_classes,
|
58 |
+
cared_classes=[key_node_idx, value_node_idx]),
|
59 |
+
dict(
|
60 |
+
type='F1Metric',
|
61 |
+
prefix='edge',
|
62 |
+
mode='micro',
|
63 |
+
key='edge_labels',
|
64 |
+
cared_classes=[1], # Collapse to binary F1 score
|
65 |
+
num_classes=edge_num_classes)
|
66 |
+
]
|
67 |
+
test_evaluator = val_evaluator
|
68 |
+
|
69 |
+
visualizer = dict(
|
70 |
+
type='KIELocalVisualizer', name='visualizer', is_openset=True)
|
71 |
+
auto_scale_lr = dict(base_batch_size=4)
|
configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'../_base_/default_runtime.py',
|
3 |
+
'../_base_/datasets/wildreceipt.py',
|
4 |
+
'../_base_/schedules/schedule_adam_60e.py',
|
5 |
+
'_base_sdmgr_novisual.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
wildreceipt_train = _base_.wildreceipt_train
|
9 |
+
wildreceipt_train.pipeline = _base_.train_pipeline
|
10 |
+
wildreceipt_test = _base_.wildreceipt_test
|
11 |
+
wildreceipt_test.pipeline = _base_.test_pipeline
|
12 |
+
|
13 |
+
train_dataloader = dict(
|
14 |
+
batch_size=4,
|
15 |
+
num_workers=1,
|
16 |
+
persistent_workers=True,
|
17 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
18 |
+
dataset=wildreceipt_train)
|
19 |
+
|
20 |
+
val_dataloader = dict(
|
21 |
+
batch_size=1,
|
22 |
+
num_workers=1,
|
23 |
+
persistent_workers=True,
|
24 |
+
sampler=dict(type='DefaultSampler', shuffle=False),
|
25 |
+
dataset=wildreceipt_test)
|
26 |
+
test_dataloader = val_dataloader
|
27 |
+
|
28 |
+
auto_scale_lr = dict(base_batch_size=4)
|
configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'../_base_/default_runtime.py',
|
3 |
+
'../_base_/datasets/wildreceipt.py',
|
4 |
+
'../_base_/schedules/schedule_adam_60e.py',
|
5 |
+
'_base_sdmgr_unet16.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
wildreceipt_train = _base_.wildreceipt_train
|
9 |
+
wildreceipt_train.pipeline = _base_.train_pipeline
|
10 |
+
wildreceipt_test = _base_.wildreceipt_test
|
11 |
+
wildreceipt_test.pipeline = _base_.test_pipeline
|
12 |
+
|
13 |
+
train_dataloader = dict(
|
14 |
+
batch_size=4,
|
15 |
+
num_workers=4,
|
16 |
+
persistent_workers=True,
|
17 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
18 |
+
dataset=wildreceipt_train)
|
19 |
+
|
20 |
+
val_dataloader = dict(
|
21 |
+
batch_size=1,
|
22 |
+
num_workers=1,
|
23 |
+
persistent_workers=True,
|
24 |
+
sampler=dict(type='DefaultSampler', shuffle=False),
|
25 |
+
dataset=wildreceipt_test)
|
26 |
+
|
27 |
+
test_dataloader = val_dataloader
|
28 |
+
|
29 |
+
auto_scale_lr = dict(base_batch_size=4)
|
configs/textdet/_base_/datasets/ctw1500.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
ctw1500_textdet_data_root = 'data/ctw1500'
|
2 |
+
|
3 |
+
ctw1500_textdet_train = dict(
|
4 |
+
type='OCRDataset',
|
5 |
+
data_root=ctw1500_textdet_data_root,
|
6 |
+
ann_file='textdet_train.json',
|
7 |
+
filter_cfg=dict(filter_empty_gt=True, min_size=32),
|
8 |
+
pipeline=None)
|
9 |
+
|
10 |
+
ctw1500_textdet_test = dict(
|
11 |
+
type='OCRDataset',
|
12 |
+
data_root=ctw1500_textdet_data_root,
|
13 |
+
ann_file='textdet_test.json',
|
14 |
+
test_mode=True,
|
15 |
+
pipeline=None)
|
configs/textdet/_base_/datasets/icdar2015.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
icdar2015_textdet_data_root = 'data/icdar2015'
|
2 |
+
|
3 |
+
icdar2015_textdet_train = dict(
|
4 |
+
type='OCRDataset',
|
5 |
+
data_root=icdar2015_textdet_data_root,
|
6 |
+
ann_file='textdet_train.json',
|
7 |
+
filter_cfg=dict(filter_empty_gt=True, min_size=32),
|
8 |
+
pipeline=None)
|
9 |
+
|
10 |
+
icdar2015_textdet_test = dict(
|
11 |
+
type='OCRDataset',
|
12 |
+
data_root=icdar2015_textdet_data_root,
|
13 |
+
ann_file='textdet_test.json',
|
14 |
+
test_mode=True,
|
15 |
+
pipeline=None)
|
configs/textdet/_base_/datasets/icdar2017.py
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
icdar2017_textdet_data_root = 'data/det/icdar_2017'
|
2 |
+
|
3 |
+
icdar2017_textdet_train = dict(
|
4 |
+
type='OCRDataset',
|
5 |
+
data_root=icdar2017_textdet_data_root,
|
6 |
+
ann_file='instances_training.json',
|
7 |
+
data_prefix=dict(img_path='imgs/'),
|
8 |
+
filter_cfg=dict(filter_empty_gt=True, min_size=32),
|
9 |
+
pipeline=None)
|
10 |
+
|
11 |
+
icdar2017_textdet_test = dict(
|
12 |
+
type='OCRDataset',
|
13 |
+
data_root=icdar2017_textdet_data_root,
|
14 |
+
ann_file='instances_test.json',
|
15 |
+
data_prefix=dict(img_path='imgs/'),
|
16 |
+
test_mode=True,
|
17 |
+
pipeline=None)
|
configs/textdet/_base_/datasets/synthtext.py
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
synthtext_textdet_data_root = 'data/synthtext'
|
2 |
+
|
3 |
+
synthtext_textdet_train = dict(
|
4 |
+
type='OCRDataset',
|
5 |
+
data_root=synthtext_textdet_data_root,
|
6 |
+
ann_file='textdet_train.json',
|
7 |
+
filter_cfg=dict(filter_empty_gt=True, min_size=32),
|
8 |
+
pipeline=None)
|
configs/textdet/_base_/datasets/totaltext.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
totaltext_textdet_data_root = 'data/totaltext'
|
2 |
+
|
3 |
+
totaltext_textdet_train = dict(
|
4 |
+
type='OCRDataset',
|
5 |
+
data_root=totaltext_textdet_data_root,
|
6 |
+
ann_file='textdet_train.json',
|
7 |
+
filter_cfg=dict(filter_empty_gt=True, min_size=32),
|
8 |
+
pipeline=None)
|
9 |
+
|
10 |
+
totaltext_textdet_test = dict(
|
11 |
+
type='OCRDataset',
|
12 |
+
data_root=totaltext_textdet_data_root,
|
13 |
+
ann_file='textdet_test.json',
|
14 |
+
test_mode=True,
|
15 |
+
pipeline=None)
|
configs/textdet/_base_/datasets/toy_data.py
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
toy_det_data_root = 'tests/data/det_toy_dataset'
|
2 |
+
|
3 |
+
toy_det_train = dict(
|
4 |
+
type='OCRDataset',
|
5 |
+
data_root=toy_det_data_root,
|
6 |
+
ann_file='instances_training.json',
|
7 |
+
data_prefix=dict(img_path='imgs/'),
|
8 |
+
filter_cfg=dict(filter_empty_gt=True, min_size=32),
|
9 |
+
pipeline=None)
|
10 |
+
|
11 |
+
toy_det_test = dict(
|
12 |
+
type='OCRDataset',
|
13 |
+
data_root=toy_det_data_root,
|
14 |
+
ann_file='instances_test.json',
|
15 |
+
data_prefix=dict(img_path='imgs/'),
|
16 |
+
test_mode=True,
|
17 |
+
pipeline=None)
|
configs/textdet/_base_/default_runtime.py
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
default_scope = 'mmocr'
|
2 |
+
env_cfg = dict(
|
3 |
+
cudnn_benchmark=False,
|
4 |
+
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
|
5 |
+
dist_cfg=dict(backend='nccl'),
|
6 |
+
)
|
7 |
+
randomness = dict(seed=None)
|
8 |
+
|
9 |
+
default_hooks = dict(
|
10 |
+
timer=dict(type='IterTimerHook'),
|
11 |
+
logger=dict(type='LoggerHook', interval=5),
|
12 |
+
param_scheduler=dict(type='ParamSchedulerHook'),
|
13 |
+
checkpoint=dict(type='CheckpointHook', interval=20),
|
14 |
+
sampler_seed=dict(type='DistSamplerSeedHook'),
|
15 |
+
sync_buffer=dict(type='SyncBuffersHook'),
|
16 |
+
visualization=dict(
|
17 |
+
type='VisualizationHook',
|
18 |
+
interval=1,
|
19 |
+
enable=False,
|
20 |
+
show=False,
|
21 |
+
draw_gt=False,
|
22 |
+
draw_pred=False),
|
23 |
+
)
|
24 |
+
|
25 |
+
# Logging
|
26 |
+
log_level = 'INFO'
|
27 |
+
log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
|
28 |
+
|
29 |
+
load_from = None
|
30 |
+
resume = False
|
31 |
+
|
32 |
+
# Evaluation
|
33 |
+
val_evaluator = dict(type='HmeanIOUMetric')
|
34 |
+
test_evaluator = val_evaluator
|
35 |
+
|
36 |
+
# Visualization
|
37 |
+
vis_backends = [dict(type='LocalVisBackend')]
|
38 |
+
visualizer = dict(
|
39 |
+
type='TextDetLocalVisualizer',
|
40 |
+
name='visualizer',
|
41 |
+
vis_backends=vis_backends)
|
configs/textdet/_base_/pretrain_runtime.py
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = 'default_runtime.py'
|
2 |
+
|
3 |
+
default_hooks = dict(
|
4 |
+
logger=dict(type='LoggerHook', interval=1000),
|
5 |
+
checkpoint=dict(
|
6 |
+
type='CheckpointHook',
|
7 |
+
interval=10000,
|
8 |
+
by_epoch=False,
|
9 |
+
max_keep_ckpts=1),
|
10 |
+
)
|
11 |
+
|
12 |
+
# Evaluation
|
13 |
+
val_evaluator = None
|
14 |
+
test_evaluator = None
|
configs/textdet/_base_/schedules/schedule_adam_600e.py
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# optimizer
|
2 |
+
optim_wrapper = dict(type='OptimWrapper', optimizer=dict(type='Adam', lr=1e-3))
|
3 |
+
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=600, val_interval=20)
|
4 |
+
val_cfg = dict(type='ValLoop')
|
5 |
+
test_cfg = dict(type='TestLoop')
|
6 |
+
# learning rate
|
7 |
+
param_scheduler = [
|
8 |
+
dict(type='PolyLR', power=0.9, end=600),
|
9 |
+
]
|
configs/textdet/_base_/schedules/schedule_sgd_100k.py
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# optimizer
|
2 |
+
optim_wrapper = dict(
|
3 |
+
type='OptimWrapper',
|
4 |
+
optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
|
5 |
+
|
6 |
+
train_cfg = dict(type='IterBasedTrainLoop', max_iters=100000)
|
7 |
+
test_cfg = None
|
8 |
+
val_cfg = None
|
9 |
+
# learning policy
|
10 |
+
param_scheduler = [
|
11 |
+
dict(type='PolyLR', power=0.9, eta_min=1e-7, by_epoch=False, end=100000),
|
12 |
+
]
|
configs/textdet/_base_/schedules/schedule_sgd_1200e.py
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# optimizer
|
2 |
+
optim_wrapper = dict(
|
3 |
+
type='OptimWrapper',
|
4 |
+
optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
|
5 |
+
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=1200, val_interval=20)
|
6 |
+
val_cfg = dict(type='ValLoop')
|
7 |
+
test_cfg = dict(type='TestLoop')
|
8 |
+
# learning policy
|
9 |
+
param_scheduler = [
|
10 |
+
dict(type='PolyLR', power=0.9, eta_min=1e-7, end=1200),
|
11 |
+
]
|
configs/textdet/_base_/schedules/schedule_sgd_base.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Note: This schedule config serves as a base config for other schedules.
|
2 |
+
# Users would have to at least fill in "max_epochs" and "val_interval"
|
3 |
+
# in order to use this config in their experiments.
|
4 |
+
|
5 |
+
# optimizer
|
6 |
+
optim_wrapper = dict(
|
7 |
+
type='OptimWrapper',
|
8 |
+
optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
|
9 |
+
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=None, val_interval=20)
|
10 |
+
val_cfg = dict(type='ValLoop')
|
11 |
+
test_cfg = dict(type='TestLoop')
|
12 |
+
# learning policy
|
13 |
+
param_scheduler = [
|
14 |
+
dict(type='ConstantLR', factor=1.0),
|
15 |
+
]
|
configs/textdet/dbnet/README.md
ADDED
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# DBNet
|
2 |
+
|
3 |
+
> [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947)
|
4 |
+
|
5 |
+
<!-- [ALGORITHM] -->
|
6 |
+
|
7 |
+
## Abstract
|
8 |
+
|
9 |
+
Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset.
|
10 |
+
|
11 |
+
<div align=center>
|
12 |
+
<img src="https://user-images.githubusercontent.com/22607038/142791306-0da6db2a-20a6-4a68-b228-64ff275f67b3.png"/>
|
13 |
+
</div>
|
14 |
+
|
15 |
+
## Results and models
|
16 |
+
|
17 |
+
### SynthText
|
18 |
+
|
19 |
+
| Method | Backbone | Training set | #iters | Download |
|
20 |
+
| :-----------------------------------------------------------------------: | :------: | :----------: | :-----: | :--------------------------------------------------------------------------------------------------: |
|
21 |
+
| [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext.py) | ResNet18 | SynthText | 100,000 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext/dbnet_resnet18_fpnc_100k_synthtext-2e9bf392.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext/20221214_150351.log) |
|
22 |
+
|
23 |
+
### ICDAR2015
|
24 |
+
|
25 |
+
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
26 |
+
| :----------------------------: | :------------------------------: | :--------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------: |
|
27 |
+
| [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py) | ResNet18 | - | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.8853 | 0.7583 | 0.8169 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/20220825_221614.log) |
|
28 |
+
| [DBNet_r50](/configs/textdet/dbnet/dbnet_resnet50_1200e_icdar2015.py) | ResNet50 | - | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.8744 | 0.8276 | 0.8504 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50_1200e_icdar2015/dbnet_resnet50_1200e_icdar2015_20221102_115917-54f50589.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50_1200e_icdar2015/20221102_115917.log) |
|
29 |
+
| [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | ResNet50-DCN | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/tmp_1.0_pretrain/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-ed322016.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.8784 | 0.8315 | 0.8543 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015_20220828_124917-452c443c.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/20220828_124917.log) |
|
30 |
+
| [DBNet_r50-oclip](/configs/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015.py) | [ResNet50-oCLIP](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) | - | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9052 | 0.8272 | 0.8644 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015/dbnet_resnet50-oclip_1200e_icdar2015_20221102_115917-bde8c87a.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015/20221102_115917.log) |
|
31 |
+
|
32 |
+
### Total Text
|
33 |
+
|
34 |
+
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
35 |
+
| :----------------------------------------------------: | :------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------: |
|
36 |
+
| [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext.py) | ResNet18 | - | Totaltext Train | Totaltext Test | 1200 | 736 | 0.8640 | 0.7770 | 0.8182 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext/dbnet_resnet18_fpnc_1200e_totaltext-3ed3233c.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext/20221219_201038.log) |
|
37 |
+
|
38 |
+
## Citation
|
39 |
+
|
40 |
+
```bibtex
|
41 |
+
@article{Liao_Wan_Yao_Chen_Bai_2020,
|
42 |
+
title={Real-Time Scene Text Detection with Differentiable Binarization},
|
43 |
+
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
|
44 |
+
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
|
45 |
+
year={2020},
|
46 |
+
pages={11474-11481}}
|
47 |
+
```
|
configs/textdet/dbnet/_base_dbnet_resnet18_fpnc.py
ADDED
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
model = dict(
|
2 |
+
type='DBNet',
|
3 |
+
backbone=dict(
|
4 |
+
type='mmdet.ResNet',
|
5 |
+
depth=18,
|
6 |
+
num_stages=4,
|
7 |
+
out_indices=(0, 1, 2, 3),
|
8 |
+
frozen_stages=-1,
|
9 |
+
norm_cfg=dict(type='BN', requires_grad=True),
|
10 |
+
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18'),
|
11 |
+
norm_eval=False,
|
12 |
+
style='caffe'),
|
13 |
+
neck=dict(
|
14 |
+
type='FPNC', in_channels=[64, 128, 256, 512], lateral_channels=256),
|
15 |
+
det_head=dict(
|
16 |
+
type='DBHead',
|
17 |
+
in_channels=256,
|
18 |
+
module_loss=dict(type='DBModuleLoss'),
|
19 |
+
postprocessor=dict(type='DBPostprocessor', text_repr_type='quad')),
|
20 |
+
data_preprocessor=dict(
|
21 |
+
type='TextDetDataPreprocessor',
|
22 |
+
mean=[123.675, 116.28, 103.53],
|
23 |
+
std=[58.395, 57.12, 57.375],
|
24 |
+
bgr_to_rgb=True,
|
25 |
+
pad_size_divisor=32))
|
26 |
+
|
27 |
+
train_pipeline = [
|
28 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
29 |
+
dict(
|
30 |
+
type='LoadOCRAnnotations',
|
31 |
+
with_polygon=True,
|
32 |
+
with_bbox=True,
|
33 |
+
with_label=True,
|
34 |
+
),
|
35 |
+
dict(
|
36 |
+
type='TorchVisionWrapper',
|
37 |
+
op='ColorJitter',
|
38 |
+
brightness=32.0 / 255,
|
39 |
+
saturation=0.5),
|
40 |
+
dict(
|
41 |
+
type='ImgAugWrapper',
|
42 |
+
args=[['Fliplr', 0.5],
|
43 |
+
dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
|
44 |
+
dict(type='RandomCrop', min_side_ratio=0.1),
|
45 |
+
dict(type='Resize', scale=(640, 640), keep_ratio=True),
|
46 |
+
dict(type='Pad', size=(640, 640)),
|
47 |
+
dict(
|
48 |
+
type='PackTextDetInputs',
|
49 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape'))
|
50 |
+
]
|
51 |
+
|
52 |
+
test_pipeline = [
|
53 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
54 |
+
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
|
55 |
+
dict(
|
56 |
+
type='LoadOCRAnnotations',
|
57 |
+
with_polygon=True,
|
58 |
+
with_bbox=True,
|
59 |
+
with_label=True,
|
60 |
+
),
|
61 |
+
dict(
|
62 |
+
type='PackTextDetInputs',
|
63 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
|
64 |
+
]
|
configs/textdet/dbnet/_base_dbnet_resnet50-dcnv2_fpnc.py
ADDED
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
model = dict(
|
2 |
+
type='DBNet',
|
3 |
+
backbone=dict(
|
4 |
+
type='mmdet.ResNet',
|
5 |
+
depth=50,
|
6 |
+
num_stages=4,
|
7 |
+
out_indices=(0, 1, 2, 3),
|
8 |
+
frozen_stages=-1,
|
9 |
+
norm_cfg=dict(type='BN', requires_grad=True),
|
10 |
+
norm_eval=False,
|
11 |
+
style='pytorch',
|
12 |
+
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
|
13 |
+
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
|
14 |
+
stage_with_dcn=(False, True, True, True)),
|
15 |
+
neck=dict(
|
16 |
+
type='FPNC', in_channels=[256, 512, 1024, 2048], lateral_channels=256),
|
17 |
+
det_head=dict(
|
18 |
+
type='DBHead',
|
19 |
+
in_channels=256,
|
20 |
+
module_loss=dict(type='DBModuleLoss'),
|
21 |
+
postprocessor=dict(type='DBPostprocessor', text_repr_type='quad')),
|
22 |
+
data_preprocessor=dict(
|
23 |
+
type='TextDetDataPreprocessor',
|
24 |
+
mean=[123.675, 116.28, 103.53],
|
25 |
+
std=[58.395, 57.12, 57.375],
|
26 |
+
bgr_to_rgb=True,
|
27 |
+
pad_size_divisor=32))
|
28 |
+
|
29 |
+
train_pipeline = [
|
30 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
31 |
+
dict(
|
32 |
+
type='LoadOCRAnnotations',
|
33 |
+
with_bbox=True,
|
34 |
+
with_polygon=True,
|
35 |
+
with_label=True,
|
36 |
+
),
|
37 |
+
dict(
|
38 |
+
type='TorchVisionWrapper',
|
39 |
+
op='ColorJitter',
|
40 |
+
brightness=32.0 / 255,
|
41 |
+
saturation=0.5),
|
42 |
+
dict(
|
43 |
+
type='ImgAugWrapper',
|
44 |
+
args=[['Fliplr', 0.5],
|
45 |
+
dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
|
46 |
+
dict(type='RandomCrop', min_side_ratio=0.1),
|
47 |
+
dict(type='Resize', scale=(640, 640), keep_ratio=True),
|
48 |
+
dict(type='Pad', size=(640, 640)),
|
49 |
+
dict(
|
50 |
+
type='PackTextDetInputs',
|
51 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape'))
|
52 |
+
]
|
53 |
+
|
54 |
+
test_pipeline = [
|
55 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
56 |
+
dict(type='Resize', scale=(4068, 1024), keep_ratio=True),
|
57 |
+
dict(
|
58 |
+
type='LoadOCRAnnotations',
|
59 |
+
with_polygon=True,
|
60 |
+
with_bbox=True,
|
61 |
+
with_label=True,
|
62 |
+
),
|
63 |
+
dict(
|
64 |
+
type='PackTextDetInputs',
|
65 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
|
66 |
+
]
|
configs/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext.py
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'_base_dbnet_resnet18_fpnc.py',
|
3 |
+
'../_base_/datasets/synthtext.py',
|
4 |
+
'../_base_/pretrain_runtime.py',
|
5 |
+
'../_base_/schedules/schedule_sgd_100k.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
train_pipeline = [
|
9 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
10 |
+
dict(
|
11 |
+
type='LoadOCRAnnotations',
|
12 |
+
with_polygon=True,
|
13 |
+
with_bbox=True,
|
14 |
+
with_label=True,
|
15 |
+
),
|
16 |
+
dict(type='FixInvalidPolygon'),
|
17 |
+
dict(
|
18 |
+
type='TorchVisionWrapper',
|
19 |
+
op='ColorJitter',
|
20 |
+
brightness=32.0 / 255,
|
21 |
+
saturation=0.5),
|
22 |
+
dict(
|
23 |
+
type='ImgAugWrapper',
|
24 |
+
args=[['Fliplr', 0.5],
|
25 |
+
dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
|
26 |
+
dict(type='RandomCrop', min_side_ratio=0.1),
|
27 |
+
dict(type='Resize', scale=(640, 640), keep_ratio=True),
|
28 |
+
dict(type='Pad', size=(640, 640)),
|
29 |
+
dict(
|
30 |
+
type='PackTextDetInputs',
|
31 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape'))
|
32 |
+
]
|
33 |
+
|
34 |
+
# dataset settings
|
35 |
+
synthtext_textdet_train = _base_.synthtext_textdet_train
|
36 |
+
synthtext_textdet_train.pipeline = train_pipeline
|
37 |
+
|
38 |
+
train_dataloader = dict(
|
39 |
+
batch_size=16,
|
40 |
+
num_workers=8,
|
41 |
+
persistent_workers=True,
|
42 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
43 |
+
dataset=synthtext_textdet_train)
|
44 |
+
|
45 |
+
auto_scale_lr = dict(base_batch_size=16)
|
configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'_base_dbnet_resnet18_fpnc.py',
|
3 |
+
'../_base_/datasets/icdar2015.py',
|
4 |
+
'../_base_/default_runtime.py',
|
5 |
+
'../_base_/schedules/schedule_sgd_1200e.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
# dataset settings
|
9 |
+
icdar2015_textdet_train = _base_.icdar2015_textdet_train
|
10 |
+
icdar2015_textdet_train.pipeline = _base_.train_pipeline
|
11 |
+
icdar2015_textdet_test = _base_.icdar2015_textdet_test
|
12 |
+
icdar2015_textdet_test.pipeline = _base_.test_pipeline
|
13 |
+
|
14 |
+
train_dataloader = dict(
|
15 |
+
batch_size=16,
|
16 |
+
num_workers=8,
|
17 |
+
persistent_workers=True,
|
18 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
19 |
+
dataset=icdar2015_textdet_train)
|
20 |
+
|
21 |
+
val_dataloader = dict(
|
22 |
+
batch_size=1,
|
23 |
+
num_workers=4,
|
24 |
+
persistent_workers=True,
|
25 |
+
sampler=dict(type='DefaultSampler', shuffle=False),
|
26 |
+
dataset=icdar2015_textdet_test)
|
27 |
+
|
28 |
+
test_dataloader = val_dataloader
|
29 |
+
|
30 |
+
auto_scale_lr = dict(base_batch_size=16)
|
configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext.py
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'_base_dbnet_resnet18_fpnc.py',
|
3 |
+
'../_base_/datasets/totaltext.py',
|
4 |
+
'../_base_/default_runtime.py',
|
5 |
+
'../_base_/schedules/schedule_sgd_1200e.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
train_pipeline = [
|
9 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
10 |
+
dict(
|
11 |
+
type='LoadOCRAnnotations',
|
12 |
+
with_polygon=True,
|
13 |
+
with_bbox=True,
|
14 |
+
with_label=True,
|
15 |
+
),
|
16 |
+
dict(type='FixInvalidPolygon', min_poly_points=4),
|
17 |
+
dict(
|
18 |
+
type='TorchVisionWrapper',
|
19 |
+
op='ColorJitter',
|
20 |
+
brightness=32.0 / 255,
|
21 |
+
saturation=0.5),
|
22 |
+
dict(
|
23 |
+
type='ImgAugWrapper',
|
24 |
+
args=[['Fliplr', 0.5],
|
25 |
+
dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
|
26 |
+
dict(type='RandomCrop', min_side_ratio=0.1),
|
27 |
+
dict(type='Resize', scale=(640, 640), keep_ratio=True),
|
28 |
+
dict(type='Pad', size=(640, 640)),
|
29 |
+
dict(
|
30 |
+
type='PackTextDetInputs',
|
31 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape'))
|
32 |
+
]
|
33 |
+
|
34 |
+
test_pipeline = [
|
35 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
36 |
+
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
|
37 |
+
dict(
|
38 |
+
type='LoadOCRAnnotations',
|
39 |
+
with_polygon=True,
|
40 |
+
with_bbox=True,
|
41 |
+
with_label=True,
|
42 |
+
),
|
43 |
+
dict(type='FixInvalidPolygon', min_poly_points=4),
|
44 |
+
dict(
|
45 |
+
type='PackTextDetInputs',
|
46 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
|
47 |
+
]
|
48 |
+
|
49 |
+
# dataset settings
|
50 |
+
totaltext_textdet_train = _base_.totaltext_textdet_train
|
51 |
+
totaltext_textdet_test = _base_.totaltext_textdet_test
|
52 |
+
totaltext_textdet_train.pipeline = train_pipeline
|
53 |
+
totaltext_textdet_test.pipeline = test_pipeline
|
54 |
+
|
55 |
+
train_dataloader = dict(
|
56 |
+
batch_size=16,
|
57 |
+
num_workers=16,
|
58 |
+
pin_memory=True,
|
59 |
+
persistent_workers=True,
|
60 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
61 |
+
dataset=totaltext_textdet_train)
|
62 |
+
|
63 |
+
val_dataloader = dict(
|
64 |
+
batch_size=1,
|
65 |
+
num_workers=1,
|
66 |
+
pin_memory=True,
|
67 |
+
persistent_workers=True,
|
68 |
+
sampler=dict(type='DefaultSampler', shuffle=False),
|
69 |
+
dataset=totaltext_textdet_test)
|
70 |
+
|
71 |
+
test_dataloader = val_dataloader
|
72 |
+
|
73 |
+
auto_scale_lr = dict(base_batch_size=16)
|
configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_100k_synthtext.py
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'_base_dbnet_resnet50-dcnv2_fpnc.py',
|
3 |
+
'../_base_/default_runtime.py',
|
4 |
+
'../_base_/datasets/synthtext.py',
|
5 |
+
'../_base_/schedules/schedule_sgd_100k.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
# dataset settings
|
9 |
+
synthtext_textdet_train = _base_.synthtext_textdet_train
|
10 |
+
synthtext_textdet_train.pipeline = _base_.train_pipeline
|
11 |
+
synthtext_textdet_test = _base_.synthtext_textdet_test
|
12 |
+
synthtext_textdet_test.pipeline = _base_.test_pipeline
|
13 |
+
|
14 |
+
train_dataloader = dict(
|
15 |
+
batch_size=16,
|
16 |
+
num_workers=8,
|
17 |
+
persistent_workers=True,
|
18 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
19 |
+
dataset=synthtext_textdet_train)
|
20 |
+
|
21 |
+
val_dataloader = dict(
|
22 |
+
batch_size=1,
|
23 |
+
num_workers=4,
|
24 |
+
persistent_workers=True,
|
25 |
+
sampler=dict(type='DefaultSampler', shuffle=False),
|
26 |
+
dataset=synthtext_textdet_test)
|
27 |
+
|
28 |
+
test_dataloader = val_dataloader
|
29 |
+
|
30 |
+
auto_scale_lr = dict(base_batch_size=16)
|
configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'_base_dbnet_resnet50-dcnv2_fpnc.py',
|
3 |
+
'../_base_/datasets/icdar2015.py',
|
4 |
+
'../_base_/default_runtime.py',
|
5 |
+
'../_base_/schedules/schedule_sgd_1200e.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
# TODO: Replace the link
|
9 |
+
load_from = 'https://download.openmmlab.com/mmocr/textdet/dbnet/tmp_1.0_pretrain/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-ed322016.pth' # noqa
|
10 |
+
|
11 |
+
# dataset settings
|
12 |
+
icdar2015_textdet_train = _base_.icdar2015_textdet_train
|
13 |
+
icdar2015_textdet_train.pipeline = _base_.train_pipeline
|
14 |
+
icdar2015_textdet_test = _base_.icdar2015_textdet_test
|
15 |
+
icdar2015_textdet_test.pipeline = _base_.test_pipeline
|
16 |
+
|
17 |
+
train_dataloader = dict(
|
18 |
+
batch_size=16,
|
19 |
+
num_workers=8,
|
20 |
+
persistent_workers=True,
|
21 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
22 |
+
dataset=icdar2015_textdet_train)
|
23 |
+
|
24 |
+
val_dataloader = dict(
|
25 |
+
batch_size=1,
|
26 |
+
num_workers=4,
|
27 |
+
persistent_workers=True,
|
28 |
+
sampler=dict(type='DefaultSampler', shuffle=False),
|
29 |
+
dataset=icdar2015_textdet_test)
|
30 |
+
|
31 |
+
test_dataloader = val_dataloader
|
32 |
+
|
33 |
+
auto_scale_lr = dict(base_batch_size=16)
|
configs/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015.py
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py',
|
3 |
+
]
|
4 |
+
|
5 |
+
load_from = None
|
6 |
+
|
7 |
+
_base_.model.backbone = dict(
|
8 |
+
type='CLIPResNet',
|
9 |
+
init_cfg=dict(
|
10 |
+
type='Pretrained',
|
11 |
+
checkpoint='https://download.openmmlab.com/'
|
12 |
+
'mmocr/backbone/resnet50-oclip-7ba0c533.pth'))
|
13 |
+
|
14 |
+
_base_.train_dataloader.num_workers = 24
|
15 |
+
_base_.optim_wrapper.optimizer.lr = 0.002
|
16 |
+
|
17 |
+
param_scheduler = [
|
18 |
+
dict(type='LinearLR', end=100, start_factor=0.001),
|
19 |
+
dict(type='PolyLR', power=0.9, eta_min=1e-7, begin=100, end=1200),
|
20 |
+
]
|
configs/textdet/dbnet/dbnet_resnet50_1200e_icdar2015.py
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py',
|
3 |
+
]
|
4 |
+
|
5 |
+
load_from = None
|
6 |
+
|
7 |
+
_base_.model.backbone = dict(
|
8 |
+
type='mmdet.ResNet',
|
9 |
+
depth=50,
|
10 |
+
num_stages=4,
|
11 |
+
out_indices=(0, 1, 2, 3),
|
12 |
+
frozen_stages=-1,
|
13 |
+
norm_cfg=dict(type='BN', requires_grad=True),
|
14 |
+
norm_eval=True,
|
15 |
+
style='pytorch',
|
16 |
+
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'))
|
17 |
+
|
18 |
+
_base_.train_dataloader.num_workers = 24
|
19 |
+
_base_.optim_wrapper.optimizer.lr = 0.002
|
20 |
+
|
21 |
+
param_scheduler = [
|
22 |
+
dict(type='LinearLR', end=100, start_factor=0.001),
|
23 |
+
dict(type='PolyLR', power=0.9, eta_min=1e-7, begin=100, end=1200),
|
24 |
+
]
|
configs/textdet/dbnet/metafile.yml
ADDED
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Collections:
|
2 |
+
- Name: DBNet
|
3 |
+
Metadata:
|
4 |
+
Training Data: ICDAR2015
|
5 |
+
Training Techniques:
|
6 |
+
- SGD with Momentum
|
7 |
+
- Weight Decay
|
8 |
+
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
9 |
+
Architecture:
|
10 |
+
- ResNet
|
11 |
+
- FPNC
|
12 |
+
Paper:
|
13 |
+
URL: https://arxiv.org/pdf/1911.08947.pdf
|
14 |
+
Title: 'Real-time Scene Text Detection with Differentiable Binarization'
|
15 |
+
README: configs/textdet/dbnet/README.md
|
16 |
+
|
17 |
+
Models:
|
18 |
+
- Name: dbnet_resnet18_fpnc_1200e_icdar2015
|
19 |
+
Alias: DB_r18
|
20 |
+
In Collection: DBNet
|
21 |
+
Config: configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py
|
22 |
+
Metadata:
|
23 |
+
Training Data: ICDAR2015
|
24 |
+
Results:
|
25 |
+
- Task: Text Detection
|
26 |
+
Dataset: ICDAR2015
|
27 |
+
Metrics:
|
28 |
+
hmean-iou: 0.8169
|
29 |
+
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth
|
30 |
+
|
31 |
+
- Name: dbnet_resnet50_fpnc_1200e_icdar2015
|
32 |
+
In Collection: DBNet
|
33 |
+
Config: configs/textdet/dbnet/dbnet_resnet50_fpnc_1200e_icdar2015.py
|
34 |
+
Metadata:
|
35 |
+
Training Data: ICDAR2015
|
36 |
+
Results:
|
37 |
+
- Task: Text Detection
|
38 |
+
Dataset: ICDAR2015
|
39 |
+
Metrics:
|
40 |
+
hmean-iou: 0.8504
|
41 |
+
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50_1200e_icdar2015/dbnet_resnet50_1200e_icdar2015_20221102_115917-54f50589.pth
|
42 |
+
|
43 |
+
- Name: dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015
|
44 |
+
In Collection: DBNet
|
45 |
+
Config: configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py
|
46 |
+
Metadata:
|
47 |
+
Training Data: ICDAR2015
|
48 |
+
Results:
|
49 |
+
- Task: Text Detection
|
50 |
+
Dataset: ICDAR2015
|
51 |
+
Metrics:
|
52 |
+
hmean-iou: 0.8543
|
53 |
+
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015_20220828_124917-452c443c.pth
|
54 |
+
|
55 |
+
- Name: dbnet_resnet50-oclip_fpnc_1200e_icdar2015
|
56 |
+
In Collection: DBNet
|
57 |
+
Alias:
|
58 |
+
- DB_r50
|
59 |
+
- DBNet
|
60 |
+
Config: configs/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015.py
|
61 |
+
Metadata:
|
62 |
+
Training Data: ICDAR2015
|
63 |
+
Results:
|
64 |
+
- Task: Text Detection
|
65 |
+
Dataset: ICDAR2015
|
66 |
+
Metrics:
|
67 |
+
hmean-iou: 0.8644
|
68 |
+
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015/dbnet_resnet50-oclip_1200e_icdar2015_20221102_115917-bde8c87a.pth
|
69 |
+
|
70 |
+
- Name: dbnet_resnet18_fpnc_1200e_totaltext
|
71 |
+
In Collection: DBNet
|
72 |
+
Config: configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext.py
|
73 |
+
Metadata:
|
74 |
+
Training Data: Totaltext
|
75 |
+
Results:
|
76 |
+
- Task: Text Detection
|
77 |
+
Dataset: Totaltext
|
78 |
+
Metrics:
|
79 |
+
hmean-iou: 0.8182
|
80 |
+
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext/dbnet_resnet18_fpnc_1200e_totaltext-3ed3233c.pth
|
configs/textdet/dbnetpp/README.md
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# DBNetpp
|
2 |
+
|
3 |
+
> [Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304)
|
4 |
+
|
5 |
+
<!-- [ALGORITHM] -->
|
6 |
+
|
7 |
+
## Abstract
|
8 |
+
|
9 |
+
Recently, segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field, because of their superiority in detecting the text instances of arbitrary shapes and extreme aspect ratios, profiting from the pixel-level descriptions. However, the vast majority of the existing segmentation-based approaches are limited to their complex post-processing algorithms and the scale robustness of their segmentation models, where the post-processing algorithms are not only isolated to the model optimization but also time-consuming and the scale robustness is usually strengthened by fusing multi-scale feature maps directly. In this paper, we propose a Differentiable Binarization (DB) module that integrates the binarization process, one of the most important steps in the post-processing procedure, into a segmentation network. Optimized along with the proposed DB module, the segmentation network can produce more accurate results, which enhances the accuracy of text detection with a simple pipeline. Furthermore, an efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively. By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.
|
10 |
+
|
11 |
+
<div align=center>
|
12 |
+
<img src="https://user-images.githubusercontent.com/45810070/166850828-f1e48c25-4a0f-429d-ae54-6997ed25c062.png"/>
|
13 |
+
</div>
|
14 |
+
|
15 |
+
## Results and models
|
16 |
+
|
17 |
+
### SynthText
|
18 |
+
|
19 |
+
| Method | BackBone | Training set | #iters | Download |
|
20 |
+
| :--------------------------------------------------------------------------------: | :------------: | :----------: | :-----: | :-----------------------------------------------------------------------------------: |
|
21 |
+
| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py) | ResNet50-dcnv2 | SynthText | 100,000 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext-00f0a80b.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext/20221215_013531.log) |
|
22 |
+
|
23 |
+
### ICDAR2015
|
24 |
+
|
25 |
+
| Method | BackBone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
26 |
+
| :----------------------------: | :------------------------------: | :--------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------: |
|
27 |
+
| [DBNetpp_r50](/configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015.py) | ResNet50 | - | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9079 | 0.8209 | 0.8622 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015/dbnetpp_resnet50_fpnc_1200e_icdar2015_20221025_185550-013730aa.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015/20221025_185550.log) |
|
28 |
+
| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | ResNet50-dcnv2 | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/tmp_1.0_pretrain/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-352fec8a.pth)) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9116 | 0.8291 | 0.8684 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015_20220829_230108-f289bd20.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/20220829_230108.log) |
|
29 |
+
| [DBNetpp_r50-oclip](/configs/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015.py) | [ResNet50-oCLIP](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) | - | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9174 | 0.8609 | 0.8882 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015_20221101_124139-4ecb39ac.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015/20221101_124139.log) |
|
30 |
+
|
31 |
+
## Citation
|
32 |
+
|
33 |
+
```bibtex
|
34 |
+
@article{liao2022real,
|
35 |
+
title={Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion},
|
36 |
+
author={Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang},
|
37 |
+
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
|
38 |
+
year={2022},
|
39 |
+
publisher={IEEE}
|
40 |
+
}
|
41 |
+
```
|
configs/textdet/dbnetpp/_base_dbnetpp_resnet50-dcnv2_fpnc.py
ADDED
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
model = dict(
|
2 |
+
type='DBNet',
|
3 |
+
backbone=dict(
|
4 |
+
type='mmdet.ResNet',
|
5 |
+
depth=50,
|
6 |
+
num_stages=4,
|
7 |
+
out_indices=(0, 1, 2, 3),
|
8 |
+
frozen_stages=-1,
|
9 |
+
norm_cfg=dict(type='BN', requires_grad=True),
|
10 |
+
norm_eval=False,
|
11 |
+
style='pytorch',
|
12 |
+
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
|
13 |
+
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
|
14 |
+
stage_with_dcn=(False, True, True, True)),
|
15 |
+
neck=dict(
|
16 |
+
type='FPNC',
|
17 |
+
in_channels=[256, 512, 1024, 2048],
|
18 |
+
lateral_channels=256,
|
19 |
+
asf_cfg=dict(attention_type='ScaleChannelSpatial')),
|
20 |
+
det_head=dict(
|
21 |
+
type='DBHead',
|
22 |
+
in_channels=256,
|
23 |
+
module_loss=dict(type='DBModuleLoss'),
|
24 |
+
postprocessor=dict(
|
25 |
+
type='DBPostprocessor', text_repr_type='quad',
|
26 |
+
epsilon_ratio=0.002)),
|
27 |
+
data_preprocessor=dict(
|
28 |
+
type='TextDetDataPreprocessor',
|
29 |
+
mean=[123.675, 116.28, 103.53],
|
30 |
+
std=[58.395, 57.12, 57.375],
|
31 |
+
bgr_to_rgb=True,
|
32 |
+
pad_size_divisor=32))
|
33 |
+
|
34 |
+
train_pipeline = [
|
35 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
36 |
+
dict(
|
37 |
+
type='LoadOCRAnnotations',
|
38 |
+
with_bbox=True,
|
39 |
+
with_polygon=True,
|
40 |
+
with_label=True,
|
41 |
+
),
|
42 |
+
dict(
|
43 |
+
type='TorchVisionWrapper',
|
44 |
+
op='ColorJitter',
|
45 |
+
brightness=32.0 / 255,
|
46 |
+
saturation=0.5),
|
47 |
+
dict(
|
48 |
+
type='ImgAugWrapper',
|
49 |
+
args=[['Fliplr', 0.5],
|
50 |
+
dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
|
51 |
+
dict(type='RandomCrop', min_side_ratio=0.1),
|
52 |
+
dict(type='Resize', scale=(640, 640), keep_ratio=True),
|
53 |
+
dict(type='Pad', size=(640, 640)),
|
54 |
+
dict(
|
55 |
+
type='PackTextDetInputs',
|
56 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape'))
|
57 |
+
]
|
58 |
+
|
59 |
+
test_pipeline = [
|
60 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
61 |
+
dict(type='Resize', scale=(4068, 1024), keep_ratio=True),
|
62 |
+
dict(
|
63 |
+
type='LoadOCRAnnotations',
|
64 |
+
with_polygon=True,
|
65 |
+
with_bbox=True,
|
66 |
+
with_label=True,
|
67 |
+
),
|
68 |
+
dict(
|
69 |
+
type='PackTextDetInputs',
|
70 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor',
|
71 |
+
'instances'))
|
72 |
+
]
|
configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'_base_dbnetpp_resnet50-dcnv2_fpnc.py',
|
3 |
+
'../_base_/pretrain_runtime.py',
|
4 |
+
'../_base_/datasets/synthtext.py',
|
5 |
+
'../_base_/schedules/schedule_sgd_100k.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
train_pipeline = [
|
9 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
10 |
+
dict(
|
11 |
+
type='LoadOCRAnnotations',
|
12 |
+
with_bbox=True,
|
13 |
+
with_polygon=True,
|
14 |
+
with_label=True,
|
15 |
+
),
|
16 |
+
dict(type='FixInvalidPolygon'),
|
17 |
+
dict(
|
18 |
+
type='TorchVisionWrapper',
|
19 |
+
op='ColorJitter',
|
20 |
+
brightness=32.0 / 255,
|
21 |
+
saturation=0.5),
|
22 |
+
dict(
|
23 |
+
type='ImgAugWrapper',
|
24 |
+
args=[['Fliplr', 0.5],
|
25 |
+
dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
|
26 |
+
dict(type='RandomCrop', min_side_ratio=0.1),
|
27 |
+
dict(type='Resize', scale=(640, 640), keep_ratio=True),
|
28 |
+
dict(type='Pad', size=(640, 640)),
|
29 |
+
dict(
|
30 |
+
type='PackTextDetInputs',
|
31 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape'))
|
32 |
+
]
|
33 |
+
|
34 |
+
synthtext_textdet_train = _base_.synthtext_textdet_train
|
35 |
+
synthtext_textdet_train.pipeline = train_pipeline
|
36 |
+
|
37 |
+
train_dataloader = dict(
|
38 |
+
batch_size=16,
|
39 |
+
num_workers=8,
|
40 |
+
persistent_workers=True,
|
41 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
42 |
+
dataset=synthtext_textdet_train)
|
43 |
+
|
44 |
+
auto_scale_lr = dict(base_batch_size=16)
|
configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'_base_dbnetpp_resnet50-dcnv2_fpnc.py',
|
3 |
+
'../_base_/default_runtime.py',
|
4 |
+
'../_base_/datasets/icdar2015.py',
|
5 |
+
'../_base_/schedules/schedule_sgd_1200e.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
load_from = 'https://download.openmmlab.com/mmocr/textdet/dbnetpp/tmp_1.0_pretrain/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-352fec8a.pth' # noqa
|
9 |
+
|
10 |
+
# dataset settings
|
11 |
+
train_list = [_base_.icdar2015_textdet_train]
|
12 |
+
test_list = [_base_.icdar2015_textdet_test]
|
13 |
+
|
14 |
+
train_dataloader = dict(
|
15 |
+
batch_size=16,
|
16 |
+
num_workers=8,
|
17 |
+
persistent_workers=True,
|
18 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
19 |
+
dataset=dict(
|
20 |
+
type='ConcatDataset',
|
21 |
+
datasets=train_list,
|
22 |
+
pipeline=_base_.train_pipeline))
|
23 |
+
|
24 |
+
val_dataloader = dict(
|
25 |
+
batch_size=16,
|
26 |
+
num_workers=8,
|
27 |
+
persistent_workers=True,
|
28 |
+
sampler=dict(type='DefaultSampler', shuffle=False),
|
29 |
+
dataset=dict(
|
30 |
+
type='ConcatDataset',
|
31 |
+
datasets=test_list,
|
32 |
+
pipeline=_base_.test_pipeline))
|
33 |
+
|
34 |
+
test_dataloader = val_dataloader
|
35 |
+
|
36 |
+
auto_scale_lr = dict(base_batch_size=16)
|
configs/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015.py
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py',
|
3 |
+
]
|
4 |
+
|
5 |
+
load_from = None
|
6 |
+
|
7 |
+
_base_.model.backbone = dict(
|
8 |
+
type='CLIPResNet',
|
9 |
+
init_cfg=dict(
|
10 |
+
type='Pretrained',
|
11 |
+
checkpoint='https://download.openmmlab.com/'
|
12 |
+
'mmocr/backbone/resnet50-oclip-7ba0c533.pth'))
|
13 |
+
|
14 |
+
_base_.train_dataloader.num_workers = 24
|
15 |
+
_base_.optim_wrapper.optimizer.lr = 0.002
|
16 |
+
|
17 |
+
param_scheduler = [
|
18 |
+
dict(type='LinearLR', end=200, start_factor=0.001),
|
19 |
+
dict(type='PolyLR', power=0.9, eta_min=1e-7, begin=200, end=1200),
|
20 |
+
]
|
configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015.py
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py',
|
3 |
+
]
|
4 |
+
|
5 |
+
load_from = None
|
6 |
+
|
7 |
+
_base_.model.backbone = dict(
|
8 |
+
type='mmdet.ResNet',
|
9 |
+
depth=50,
|
10 |
+
num_stages=4,
|
11 |
+
out_indices=(0, 1, 2, 3),
|
12 |
+
frozen_stages=-1,
|
13 |
+
norm_cfg=dict(type='BN', requires_grad=True),
|
14 |
+
norm_eval=True,
|
15 |
+
style='pytorch',
|
16 |
+
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'))
|
17 |
+
|
18 |
+
_base_.train_dataloader.num_workers = 24
|
19 |
+
_base_.optim_wrapper.optimizer.lr = 0.003
|
20 |
+
|
21 |
+
param_scheduler = [
|
22 |
+
dict(type='LinearLR', end=200, start_factor=0.001),
|
23 |
+
dict(type='PolyLR', power=0.9, eta_min=1e-7, begin=200, end=1200),
|
24 |
+
]
|
configs/textdet/dbnetpp/metafile.yml
ADDED
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Collections:
|
2 |
+
- Name: DBNetpp
|
3 |
+
Metadata:
|
4 |
+
Training Data: ICDAR2015
|
5 |
+
Training Techniques:
|
6 |
+
- SGD with Momentum
|
7 |
+
- Weight Decay
|
8 |
+
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
9 |
+
Architecture:
|
10 |
+
- ResNet
|
11 |
+
- FPNC
|
12 |
+
Paper:
|
13 |
+
URL: https://arxiv.org/abs/2202.10304
|
14 |
+
Title: 'Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion'
|
15 |
+
README: configs/textdet/dbnetpp/README.md
|
16 |
+
|
17 |
+
Models:
|
18 |
+
- Name: dbnetpp_resnet50_fpnc_1200e_icdar2015
|
19 |
+
In Collection: DBNetpp
|
20 |
+
Alias:
|
21 |
+
- DBPP_r50
|
22 |
+
Config: configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015.py
|
23 |
+
Metadata:
|
24 |
+
Training Data: ICDAR2015
|
25 |
+
Results:
|
26 |
+
- Task: Text Detection
|
27 |
+
Dataset: ICDAR2015
|
28 |
+
Metrics:
|
29 |
+
hmean-iou: 0.8622
|
30 |
+
Weights: https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015/dbnetpp_resnet50_fpnc_1200e_icdar2015_20221025_185550-013730aa.pth
|
31 |
+
|
32 |
+
- Name: dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015
|
33 |
+
In Collection: DBNetpp
|
34 |
+
Config: configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py
|
35 |
+
Metadata:
|
36 |
+
Training Data: ICDAR2015
|
37 |
+
Results:
|
38 |
+
- Task: Text Detection
|
39 |
+
Dataset: ICDAR2015
|
40 |
+
Metrics:
|
41 |
+
hmean-iou: 0.8684
|
42 |
+
Weights: https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015_20220829_230108-f289bd20.pth
|
43 |
+
|
44 |
+
- Name: dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015
|
45 |
+
Alias:
|
46 |
+
- DBNetpp
|
47 |
+
In Collection: DBNetpp
|
48 |
+
Config: configs/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015.py
|
49 |
+
Metadata:
|
50 |
+
Training Data: ICDAR2015
|
51 |
+
Results:
|
52 |
+
- Task: Text Detection
|
53 |
+
Dataset: ICDAR2015
|
54 |
+
Metrics:
|
55 |
+
hmean-iou: 0.8882
|
56 |
+
Weights: https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015_20221101_124139-4ecb39ac.pth
|
configs/textdet/drrg/README.md
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# DRRG
|
2 |
+
|
3 |
+
> [Deep relational reasoning graph network for arbitrary shape text detection](https://arxiv.org/abs/2003.07493)
|
4 |
+
|
5 |
+
<!-- [ALGORITHM] -->
|
6 |
+
|
7 |
+
## Abstract
|
8 |
+
|
9 |
+
Arbitrary shape text detection is a challenging task due to the high variety and complexity of scenes texts. In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection. In our method, an innovative local graph bridges a text proposal model via Convolutional Neural Network (CNN) and a deep relational reasoning network via Graph Convolutional Network (GCN), making our network end-to-end trainable. To be concrete, every text instance will be divided into a series of small rectangular components, and the geometry attributes (e.g., height, width, and orientation) of the small components will be estimated by our text proposal model. Given the geometry attributes, the local graph construction model can roughly establish linkages between different text components. For further reasoning and deducing the likelihood of linkages between the component and its neighbors, we adopt a graph-based network to perform deep relational reasoning on local graphs. Experiments on public available datasets demonstrate the state-of-the-art performance of our method.
|
10 |
+
|
11 |
+
<div align=center>
|
12 |
+
<img src="https://user-images.githubusercontent.com/22607038/142791777-f282300a-fb83-4b5a-a7d4-29f308949f11.png"/>
|
13 |
+
</div>
|
14 |
+
|
15 |
+
## Results and models
|
16 |
+
|
17 |
+
### CTW1500
|
18 |
+
|
19 |
+
| Method | BackBone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
20 |
+
| :-------------------------------------: | :---------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :----------------------------------------: |
|
21 |
+
| [DRRG](/configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py) | ResNet50 | - | CTW1500 Train | CTW1500 Test | 1200 | 640 | 0.8775 | 0.8179 | 0.8467 | [model](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/drrg_resnet50_fpn-unet_1200e_ctw1500_20220827_105233-d5c702dd.pth) \\ [log](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/20220827_105233.log) |
|
22 |
+
| [DRRG_r50-oclip](/configs/textdet/drrg/drrg_resnet50-oclip_fpn-unet_1200e_ctw1500.py) | [ResNet50-oCLIP](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) | - | CTW1500 Train | CTW1500 Test | 1200 | | | | | [model](<>) \\ [log](<>) |
|
23 |
+
|
24 |
+
## Citation
|
25 |
+
|
26 |
+
```bibtex
|
27 |
+
@article{zhang2020drrg,
|
28 |
+
title={Deep relational reasoning graph network for arbitrary shape text detection},
|
29 |
+
author={Zhang, Shi-Xue and Zhu, Xiaobin and Hou, Jie-Bo and Liu, Chang and Yang, Chun and Wang, Hongfa and Yin, Xu-Cheng},
|
30 |
+
booktitle={CVPR},
|
31 |
+
pages={9699-9708},
|
32 |
+
year={2020}
|
33 |
+
}
|
34 |
+
```
|
configs/textdet/drrg/_base_drrg_resnet50_fpn-unet.py
ADDED
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
model = dict(
|
2 |
+
type='DRRG',
|
3 |
+
backbone=dict(
|
4 |
+
type='mmdet.ResNet',
|
5 |
+
depth=50,
|
6 |
+
num_stages=4,
|
7 |
+
out_indices=(0, 1, 2, 3),
|
8 |
+
frozen_stages=-1,
|
9 |
+
norm_cfg=dict(type='BN', requires_grad=True),
|
10 |
+
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
|
11 |
+
norm_eval=True,
|
12 |
+
style='caffe'),
|
13 |
+
neck=dict(
|
14 |
+
type='FPN_UNet', in_channels=[256, 512, 1024, 2048], out_channels=32),
|
15 |
+
det_head=dict(
|
16 |
+
type='DRRGHead',
|
17 |
+
in_channels=32,
|
18 |
+
text_region_thr=0.3,
|
19 |
+
center_region_thr=0.4,
|
20 |
+
module_loss=dict(type='DRRGModuleLoss'),
|
21 |
+
postprocessor=dict(type='DRRGPostprocessor', link_thr=0.80)),
|
22 |
+
data_preprocessor=dict(
|
23 |
+
type='TextDetDataPreprocessor',
|
24 |
+
mean=[123.675, 116.28, 103.53],
|
25 |
+
std=[58.395, 57.12, 57.375],
|
26 |
+
bgr_to_rgb=True,
|
27 |
+
pad_size_divisor=32))
|
28 |
+
|
29 |
+
train_pipeline = [
|
30 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
31 |
+
dict(
|
32 |
+
type='LoadOCRAnnotations',
|
33 |
+
with_bbox=True,
|
34 |
+
with_polygon=True,
|
35 |
+
with_label=True),
|
36 |
+
dict(
|
37 |
+
type='TorchVisionWrapper',
|
38 |
+
op='ColorJitter',
|
39 |
+
brightness=32.0 / 255,
|
40 |
+
saturation=0.5),
|
41 |
+
dict(
|
42 |
+
type='RandomResize',
|
43 |
+
scale=(800, 800),
|
44 |
+
ratio_range=(0.75, 2.5),
|
45 |
+
keep_ratio=True),
|
46 |
+
dict(
|
47 |
+
type='TextDetRandomCropFlip',
|
48 |
+
crop_ratio=0.5,
|
49 |
+
iter_num=1,
|
50 |
+
min_area_ratio=0.2),
|
51 |
+
dict(
|
52 |
+
type='RandomApply',
|
53 |
+
transforms=[dict(type='RandomCrop', min_side_ratio=0.3)],
|
54 |
+
prob=0.8),
|
55 |
+
dict(
|
56 |
+
type='RandomApply',
|
57 |
+
transforms=[
|
58 |
+
dict(
|
59 |
+
type='RandomRotate',
|
60 |
+
max_angle=60,
|
61 |
+
use_canvas=True,
|
62 |
+
pad_with_fixed_color=False)
|
63 |
+
],
|
64 |
+
prob=0.5),
|
65 |
+
dict(
|
66 |
+
type='RandomChoice',
|
67 |
+
transforms=[[
|
68 |
+
dict(type='Resize', scale=800, keep_ratio=True),
|
69 |
+
dict(type='SourceImagePad', target_scale=800)
|
70 |
+
],
|
71 |
+
dict(type='Resize', scale=800, keep_ratio=False)],
|
72 |
+
prob=[0.4, 0.6]),
|
73 |
+
dict(type='RandomFlip', prob=0.5, direction='horizontal'),
|
74 |
+
dict(
|
75 |
+
type='PackTextDetInputs',
|
76 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape'))
|
77 |
+
]
|
78 |
+
|
79 |
+
test_pipeline = [
|
80 |
+
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
|
81 |
+
dict(type='Resize', scale=(1024, 640), keep_ratio=True),
|
82 |
+
# add loading annotation after ``Resize`` because ground truth
|
83 |
+
# does not need to do resize data transform
|
84 |
+
dict(
|
85 |
+
type='LoadOCRAnnotations',
|
86 |
+
with_polygon=True,
|
87 |
+
with_bbox=True,
|
88 |
+
with_label=True),
|
89 |
+
dict(
|
90 |
+
type='PackTextDetInputs',
|
91 |
+
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
|
92 |
+
]
|
configs/textdet/drrg/drrg_resnet50-oclip_fpn-unet_1200e_ctw1500.py
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'drrg_resnet50_fpn-unet_1200e_ctw1500.py',
|
3 |
+
]
|
4 |
+
|
5 |
+
load_from = None
|
6 |
+
|
7 |
+
_base_.model.backbone = dict(
|
8 |
+
type='CLIPResNet',
|
9 |
+
init_cfg=dict(
|
10 |
+
type='Pretrained',
|
11 |
+
checkpoint='https://download.openmmlab.com/'
|
12 |
+
'mmocr/backbone/resnet50-oclip-7ba0c533.pth'))
|
13 |
+
|
14 |
+
param_scheduler = [
|
15 |
+
dict(type='LinearLR', end=100, start_factor=0.001),
|
16 |
+
dict(type='PolyLR', power=0.9, eta_min=1e-7, begin=100, end=1200),
|
17 |
+
]
|
configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
_base_ = [
|
2 |
+
'_base_drrg_resnet50_fpn-unet.py',
|
3 |
+
'../_base_/datasets/ctw1500.py',
|
4 |
+
'../_base_/default_runtime.py',
|
5 |
+
'../_base_/schedules/schedule_sgd_1200e.py',
|
6 |
+
]
|
7 |
+
|
8 |
+
# dataset settings
|
9 |
+
ctw1500_textdet_train = _base_.ctw1500_textdet_train
|
10 |
+
ctw1500_textdet_train.pipeline = _base_.train_pipeline
|
11 |
+
ctw1500_textdet_test = _base_.ctw1500_textdet_test
|
12 |
+
ctw1500_textdet_test.pipeline = _base_.test_pipeline
|
13 |
+
|
14 |
+
train_dataloader = dict(
|
15 |
+
batch_size=4,
|
16 |
+
num_workers=4,
|
17 |
+
persistent_workers=True,
|
18 |
+
sampler=dict(type='DefaultSampler', shuffle=True),
|
19 |
+
dataset=ctw1500_textdet_train)
|
20 |
+
|
21 |
+
val_dataloader = dict(
|
22 |
+
batch_size=1,
|
23 |
+
num_workers=1,
|
24 |
+
persistent_workers=True,
|
25 |
+
sampler=dict(type='DefaultSampler', shuffle=False),
|
26 |
+
dataset=ctw1500_textdet_test)
|
27 |
+
|
28 |
+
test_dataloader = val_dataloader
|
29 |
+
|
30 |
+
auto_scale_lr = dict(base_batch_size=16)
|
configs/textdet/drrg/metafile.yml
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Collections:
|
2 |
+
- Name: DRRG
|
3 |
+
Metadata:
|
4 |
+
Training Data: SCUT-CTW1500
|
5 |
+
Training Techniques:
|
6 |
+
- SGD with Momentum
|
7 |
+
Training Resources: 4x NVIDIA A100-SXM4-80GB
|
8 |
+
Architecture:
|
9 |
+
- ResNet
|
10 |
+
- FPN_UNet
|
11 |
+
Paper:
|
12 |
+
URL: https://arxiv.org/abs/2003.07493.pdf
|
13 |
+
Title: 'Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection'
|
14 |
+
README: configs/textdet/drrg/README.md
|
15 |
+
|
16 |
+
Models:
|
17 |
+
- Name: drrg_resnet50_fpn-unet_1200e_ctw1500
|
18 |
+
Alias: DRRG
|
19 |
+
In Collection: DRRG
|
20 |
+
Config: configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py
|
21 |
+
Metadata:
|
22 |
+
Training Data: CTW1500
|
23 |
+
Results:
|
24 |
+
- Task: Text Detection
|
25 |
+
Dataset: CTW1500
|
26 |
+
Metrics:
|
27 |
+
hmean-iou: 0.8467
|
28 |
+
Weights: https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/drrg_resnet50_fpn-unet_1200e_ctw1500_20220827_105233-d5c702dd.pth
|
configs/textdet/fcenet/README.md
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# FCENet
|
2 |
+
|
3 |
+
> [Fourier Contour Embedding for Arbitrary-Shaped Text Detection](https://arxiv.org/abs/2104.10442)
|
4 |
+
|
5 |
+
<!-- [ALGORITHM] -->
|
6 |
+
|
7 |
+
## Abstract
|
8 |
+
|
9 |
+
One of the main challenges for arbitrary-shaped text detection is to design a good text instance representation that allows networks to learn diverse text geometry variances. Most of existing methods model text instances in image spatial domain via masks or contour point sequences in the Cartesian or the polar coordinate system. However, the mask representation might lead to expensive post-processing, while the point sequence one may have limited capability to model texts with highly-curved shapes. To tackle these problems, we model text instances in the Fourier domain and propose one novel Fourier Contour Embedding (FCE) method to represent arbitrary shaped text contours as compact signatures. We further construct FCENet with a backbone, feature pyramid networks (FPN) and a simple post-processing with the Inverse Fourier Transformation (IFT) and Non-Maximum Suppression (NMS). Different from previous methods, FCENet first predicts compact Fourier signatures of text instances, and then reconstructs text contours via IFT and NMS during test. Extensive experiments demonstrate that FCE is accurate and robust to fit contours of scene texts even with highly-curved shapes, and also validate the effectiveness and the good generalization of FCENet for arbitrary-shaped text detection. Furthermore, experimental results show that our FCENet is superior to the state-of-the-art (SOTA) methods on CTW1500 and Total-Text, especially on challenging highly-curved text subset.
|
10 |
+
|
11 |
+
<div align=center>
|
12 |
+
<img src="https://user-images.githubusercontent.com/22607038/142791859-1b0ebde4-b151-4c25-ba1b-f354bd8ddc8c.png"/>
|
13 |
+
</div>
|
14 |
+
|
15 |
+
## Results and models
|
16 |
+
|
17 |
+
### CTW1500
|
18 |
+
|
19 |
+
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
20 |
+
| :------------------------------------: | :---------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :-------: | :----: | :----: | :---------------------------------------: |
|
21 |
+
| [FCENet_r50dcn](/configs/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | - | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.8689 | 0.8296 | 0.8488 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500_20220825_221510-4d705392.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/20220825_221510.log) |
|
22 |
+
| [FCENet_r50-oclip](/configs/textdet/fcenet/fcenet_resnet50-oclip-dcnv2_fpn_1500e_ctw1500.py) | [ResNet50-oCLIP](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) | - | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.8383 | 0.801 | 0.8192 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-oclip_fpn_1500e_ctw1500/fcenet_resnet50-oclip_fpn_1500e_ctw1500_20221102_121909-101df7e6.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-oclip_fpn_1500e_ctw1500/20221102_121909.log) |
|
23 |
+
|
24 |
+
### ICDAR2015
|
25 |
+
|
26 |
+
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
27 |
+
| :---------------------------------------------------: | :------------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :-------: | :----: | :----: | :------------------------------------------------------: |
|
28 |
+
| [FCENet_r50](/configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py) | ResNet50 | - | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.8243 | 0.8834 | 0.8528 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/fcenet_resnet50_fpn_1500e_icdar2015_20220826_140941-167d9042.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/20220826_140941.log) |
|
29 |
+
| [FCENet_r50-oclip](/configs/textdet/fcenet/fcenet_resnet50-oclip_fpn_1500e_icdar2015.py) | ResNet50-oCLIP | - | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.9176 | 0.8098 | 0.8604 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-oclip_fpn_1500e_icdar2015/fcenet_resnet50-oclip_fpn_1500e_icdar2015_20221101_150145-5a6fc412.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-oclip_fpn_1500e_icdar2015/20221101_150145.log) |
|
30 |
+
|
31 |
+
### Total Text
|
32 |
+
|
33 |
+
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
34 |
+
| :---------------------------------------------------: | :------: | :--------------: | :-------------: | :------------: | :-----: | :---------: | :-------: | :----: | :----: | :-----------------------------------------------------: |
|
35 |
+
| [FCENet_r50](/configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_totaltext.py) | ResNet50 | - | Totaltext Train | Totaltext Test | 1500 | (1280, 960) | 0.8485 | 0.7810 | 0.8134 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_totaltext/fcenet_resnet50_fpn_1500e_totaltext-91bd37af.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_totaltext/20221219_201107.log) |
|
36 |
+
|
37 |
+
## Citation
|
38 |
+
|
39 |
+
```bibtex
|
40 |
+
@InProceedings{zhu2021fourier,
|
41 |
+
title={Fourier Contour Embedding for Arbitrary-Shaped Text Detection},
|
42 |
+
author={Yiqin Zhu and Jianyong Chen and Lingyu Liang and Zhanghui Kuang and Lianwen Jin and Wayne Zhang},
|
43 |
+
year={2021},
|
44 |
+
booktitle = {CVPR}
|
45 |
+
}
|
46 |
+
```
|