Mountchicken commited on
Commit
9bf4bd7
1 Parent(s): 6163561

Upload 704 files

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. CITATION.cff +9 -0
  2. configs/backbone/oclip/README.md +41 -0
  3. configs/backbone/oclip/metafile.yml +13 -0
  4. configs/kie/_base_/datasets/wildreceipt-openset.py +26 -0
  5. configs/kie/_base_/datasets/wildreceipt.py +16 -0
  6. configs/kie/_base_/default_runtime.py +33 -0
  7. configs/kie/_base_/schedules/schedule_adam_60e.py +10 -0
  8. configs/kie/sdmgr/README.md +41 -0
  9. configs/kie/sdmgr/_base_sdmgr_novisual.py +35 -0
  10. configs/kie/sdmgr/_base_sdmgr_unet16.py +28 -0
  11. configs/kie/sdmgr/metafile.yml +52 -0
  12. configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py +71 -0
  13. configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py +28 -0
  14. configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py +29 -0
  15. configs/textdet/_base_/datasets/ctw1500.py +15 -0
  16. configs/textdet/_base_/datasets/icdar2015.py +15 -0
  17. configs/textdet/_base_/datasets/icdar2017.py +17 -0
  18. configs/textdet/_base_/datasets/synthtext.py +8 -0
  19. configs/textdet/_base_/datasets/totaltext.py +15 -0
  20. configs/textdet/_base_/datasets/toy_data.py +17 -0
  21. configs/textdet/_base_/default_runtime.py +41 -0
  22. configs/textdet/_base_/pretrain_runtime.py +14 -0
  23. configs/textdet/_base_/schedules/schedule_adam_600e.py +9 -0
  24. configs/textdet/_base_/schedules/schedule_sgd_100k.py +12 -0
  25. configs/textdet/_base_/schedules/schedule_sgd_1200e.py +11 -0
  26. configs/textdet/_base_/schedules/schedule_sgd_base.py +15 -0
  27. configs/textdet/dbnet/README.md +47 -0
  28. configs/textdet/dbnet/_base_dbnet_resnet18_fpnc.py +64 -0
  29. configs/textdet/dbnet/_base_dbnet_resnet50-dcnv2_fpnc.py +66 -0
  30. configs/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext.py +45 -0
  31. configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py +30 -0
  32. configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext.py +73 -0
  33. configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_100k_synthtext.py +30 -0
  34. configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py +33 -0
  35. configs/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015.py +20 -0
  36. configs/textdet/dbnet/dbnet_resnet50_1200e_icdar2015.py +24 -0
  37. configs/textdet/dbnet/metafile.yml +80 -0
  38. configs/textdet/dbnetpp/README.md +41 -0
  39. configs/textdet/dbnetpp/_base_dbnetpp_resnet50-dcnv2_fpnc.py +72 -0
  40. configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py +44 -0
  41. configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py +36 -0
  42. configs/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015.py +20 -0
  43. configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015.py +24 -0
  44. configs/textdet/dbnetpp/metafile.yml +56 -0
  45. configs/textdet/drrg/README.md +34 -0
  46. configs/textdet/drrg/_base_drrg_resnet50_fpn-unet.py +92 -0
  47. configs/textdet/drrg/drrg_resnet50-oclip_fpn-unet_1200e_ctw1500.py +17 -0
  48. configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py +30 -0
  49. configs/textdet/drrg/metafile.yml +28 -0
  50. configs/textdet/fcenet/README.md +46 -0
CITATION.cff ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ cff-version: 1.2.0
2
+ message: "If you use this software, please cite it as below."
3
+ title: "OpenMMLab Text Detection, Recognition and Understanding Toolbox"
4
+ authors:
5
+ - name: "MMOCR Contributors"
6
+ version: 0.3.0
7
+ date-released: 2020-08-15
8
+ repository-code: "https://github.com/open-mmlab/mmocr"
9
+ license: Apache-2.0
configs/backbone/oclip/README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # oCLIP
2
+
3
+ > [Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880282.pdf)
4
+
5
+ <!-- [ALGORITHM] -->
6
+
7
+ ## Abstract
8
+
9
+ Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-language tasks by jointly learning visual and textual representations, which intuitively helps in Optical Character Recognition (OCR) tasks due to the rich visual and textual information in scene text images. However, these methods cannot well cope with OCR tasks because of the difficulty in both instance-level text encoding and image-text pair acquisition (i.e. images and captured texts in them). This paper presents a weakly supervised pre-training method, oCLIP, which can acquire effective scene text representations by jointly learning and aligning visual and textual information. Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features, respectively, as well as a visual-textual decoder that models the interaction among textual and visual features for learning effective scene text representations. With the learning of textual features, the pre-trained model can attend texts in images well with character awareness. Besides, these designs enable the learning from weakly annotated texts (i.e. partial texts in images without text bounding boxes) which mitigates the data annotation constraint greatly. Experiments over the weakly annotated images in ICDAR2019-LSVT show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks, respectively. In addition, the proposed method outperforms existing pre-training techniques consistently across multiple public datasets (e.g., +3.2% and +1.3% for Total-Text and CTW1500).
10
+
11
+ <div align=center>
12
+ <img src="https://user-images.githubusercontent.com/24622904/199475057-aa688422-518d-4d7a-86fc-1be0cc1b5dc6.png"/>
13
+ </div>
14
+
15
+ ## Models
16
+
17
+ | Backbone | Pre-train Data | Model |
18
+ | :-------: | :------------: | :-------------------------------------------------------------------------------: |
19
+ | ResNet-50 | SynthText | [Link](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) |
20
+
21
+ ```{note}
22
+ The model is converted from the official [oCLIP](https://github.com/bytedance/oclip.git).
23
+ ```
24
+
25
+ ## Supported Text Detection Models
26
+
27
+ | | [DBNet](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#dbnet) | [DBNet++](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#dbnetpp) | [FCENet](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#fcenet) | [TextSnake](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#fcenet) | [PSENet](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#psenet) | [DRRG](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#drrg) | [Mask R-CNN](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#mask-r-cnn) |
28
+ | :-------: | :------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :--------------------------------------------------------------------------: | :-----------------------------------------------------------------------------: | :--------------------------------------------------------------------------: | :----------------------------------------------------------------------: | :----------------------------------------------------------------------------------: |
29
+ | ICDAR2015 | ✓ | ✓ | ✓ | | ✓ | | ✓ |
30
+ | CTW1500 | | | ✓ | ✓ | ✓ | ✓ | ✓ |
31
+
32
+ ## Citation
33
+
34
+ ```bibtex
35
+ @article{xue2022language,
36
+ title={Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting},
37
+ author={Xue, Chuhui and Zhang, Wenqing and Hao, Yu and Lu, Shijian and Torr, Philip and Bai, Song},
38
+ journal={Proceedings of the European Conference on Computer Vision (ECCV)},
39
+ year={2022}
40
+ }
41
+ ```
configs/backbone/oclip/metafile.yml ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Collections:
2
+ - Name: oCLIP
3
+ Metadata:
4
+ Training Data: SynthText
5
+ Architecture:
6
+ - CLIPResNet
7
+ Paper:
8
+ URL: https://arxiv.org/abs/2203.03911
9
+ Title: 'Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting'
10
+ README: configs/backbone/oclip/README.md
11
+
12
+ Models:
13
+ Weights: https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth
configs/kie/_base_/datasets/wildreceipt-openset.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wildreceipt_openset_data_root = 'data/wildreceipt/'
2
+
3
+ wildreceipt_openset_train = dict(
4
+ type='WildReceiptDataset',
5
+ data_root=wildreceipt_openset_data_root,
6
+ metainfo=dict(category=[
7
+ dict(id=0, name='bg'),
8
+ dict(id=1, name='key'),
9
+ dict(id=2, name='value'),
10
+ dict(id=3, name='other')
11
+ ]),
12
+ ann_file='openset_train.txt',
13
+ pipeline=None)
14
+
15
+ wildreceipt_openset_test = dict(
16
+ type='WildReceiptDataset',
17
+ data_root=wildreceipt_openset_data_root,
18
+ metainfo=dict(category=[
19
+ dict(id=0, name='bg'),
20
+ dict(id=1, name='key'),
21
+ dict(id=2, name='value'),
22
+ dict(id=3, name='other')
23
+ ]),
24
+ ann_file='openset_test.txt',
25
+ test_mode=True,
26
+ pipeline=None)
configs/kie/_base_/datasets/wildreceipt.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ wildreceipt_data_root = 'data/wildreceipt/'
2
+
3
+ wildreceipt_train = dict(
4
+ type='WildReceiptDataset',
5
+ data_root=wildreceipt_data_root,
6
+ metainfo=wildreceipt_data_root + 'class_list.txt',
7
+ ann_file='train.txt',
8
+ pipeline=None)
9
+
10
+ wildreceipt_test = dict(
11
+ type='WildReceiptDataset',
12
+ data_root=wildreceipt_data_root,
13
+ metainfo=wildreceipt_data_root + 'class_list.txt',
14
+ ann_file='test.txt',
15
+ test_mode=True,
16
+ pipeline=None)
configs/kie/_base_/default_runtime.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ default_scope = 'mmocr'
2
+ env_cfg = dict(
3
+ cudnn_benchmark=False,
4
+ mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
5
+ dist_cfg=dict(backend='nccl'),
6
+ )
7
+ randomness = dict(seed=None)
8
+
9
+ default_hooks = dict(
10
+ timer=dict(type='IterTimerHook'),
11
+ logger=dict(type='LoggerHook', interval=100),
12
+ param_scheduler=dict(type='ParamSchedulerHook'),
13
+ checkpoint=dict(type='CheckpointHook', interval=1),
14
+ sampler_seed=dict(type='DistSamplerSeedHook'),
15
+ sync_buffer=dict(type='SyncBuffersHook'),
16
+ visualization=dict(
17
+ type='VisualizationHook',
18
+ interval=1,
19
+ enable=False,
20
+ show=False,
21
+ draw_gt=False,
22
+ draw_pred=False),
23
+ )
24
+
25
+ # Logging
26
+ log_level = 'INFO'
27
+ log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
28
+
29
+ load_from = None
30
+ resume = False
31
+
32
+ visualizer = dict(
33
+ type='KIELocalVisualizer', name='visualizer', is_openset=False)
configs/kie/_base_/schedules/schedule_adam_60e.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # optimizer
2
+ optim_wrapper = dict(
3
+ type='OptimWrapper', optimizer=dict(type='Adam', weight_decay=0.0001))
4
+ train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=60, val_interval=1)
5
+ val_cfg = dict(type='ValLoop')
6
+ test_cfg = dict(type='TestLoop')
7
+ # learning rate
8
+ param_scheduler = [
9
+ dict(type='MultiStepLR', milestones=[40, 50], end=60),
10
+ ]
configs/kie/sdmgr/README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SDMGR
2
+
3
+ > [Spatial Dual-Modality Graph Reasoning for Key Information Extraction](https://arxiv.org/abs/2103.14470)
4
+
5
+ <!-- [ALGORITHM] -->
6
+
7
+ ## Abstract
8
+
9
+ Key information extraction from document images is of paramount importance in office automation. Conventional template matching based approaches fail to generalize well to document images of unseen templates, and are not robust against text recognition errors. In this paper, we propose an end-to-end Spatial Dual-Modality Graph Reasoning method (SDMG-R) to extract key information from unstructured document images. We model document images as dual-modality graphs, nodes of which encode both the visual and textual features of detected text regions, and edges of which represent the spatial relations between neighboring text regions. The key information extraction is solved by iteratively propagating messages along graph edges and reasoning the categories of graph nodes. In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild. It contains 25 key information categories, a total of about 69000 text boxes, and is about 2 times larger than the existing public datasets. Extensive experiments validate that all information including visual features, textual features and spatial relations can benefit key information extraction. It has been shown that SDMG-R can effectively extract key information from document images of unseen templates, and obtain new state-of-the-art results on the recent popular benchmark SROIE and our WildReceipt. Our code and dataset will be publicly released.
10
+
11
+ <div align=center>
12
+ <img src="https://user-images.githubusercontent.com/22607038/142580689-18edb4d7-f716-475c-b1c1-e2b934658cee.png"/>
13
+ </div>
14
+
15
+ ## Results and models
16
+
17
+ ### WildReceipt
18
+
19
+ | Method | Modality | Macro F1-Score | Download |
20
+ | :--------------------------------------------------------------------: | :--------------: | :------------: | :--------------------------------------------------------------------------------------------------: |
21
+ | [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.890 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/20220825_151648.log) |
22
+ | [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.873 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/sdmgr_novisual_60e_wildreceipt_20220831_193317-827649d8.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/20220831_193317.log) |
23
+
24
+ ### WildReceiptOpenset
25
+
26
+ | Method | Modality | Edge F1-Score | Node Macro F1-Score | Node Micro F1-Score | Download |
27
+ | :-------------------------------------------------------------------: | :------: | :-----------: | :-----------------: | :-----------------: | :----------------------------------------------------------------------: |
28
+ | [sdmgr_novisual_openset](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py) | Textual | 0.792 | 0.931 | 0.940 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/sdmgr_novisual_60e_wildreceipt-openset_20220831_200807-dedf15ec.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/20220831_200807.log) |
29
+
30
+ ## Citation
31
+
32
+ ```bibtex
33
+ @misc{sun2021spatial,
34
+ title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction},
35
+ author={Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang},
36
+ year={2021},
37
+ eprint={2103.14470},
38
+ archivePrefix={arXiv},
39
+ primaryClass={cs.CV}
40
+ }
41
+ ```
configs/kie/sdmgr/_base_sdmgr_novisual.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ num_classes = 26
2
+
3
+ model = dict(
4
+ type='SDMGR',
5
+ kie_head=dict(
6
+ type='SDMGRHead',
7
+ visual_dim=16,
8
+ num_classes=num_classes,
9
+ module_loss=dict(type='SDMGRModuleLoss'),
10
+ postprocessor=dict(type='SDMGRPostProcessor')),
11
+ dictionary=dict(
12
+ type='Dictionary',
13
+ dict_file='{{ fileDirname }}/../../../dicts/sdmgr_dict.txt',
14
+ with_padding=True,
15
+ with_unknown=True,
16
+ unknown_token=None),
17
+ )
18
+
19
+ train_pipeline = [
20
+ dict(type='LoadKIEAnnotations'),
21
+ dict(type='Resize', scale=(1024, 512), keep_ratio=True),
22
+ dict(type='PackKIEInputs')
23
+ ]
24
+ test_pipeline = [
25
+ dict(type='LoadKIEAnnotations'),
26
+ dict(type='Resize', scale=(1024, 512), keep_ratio=True),
27
+ dict(type='PackKIEInputs'),
28
+ ]
29
+
30
+ val_evaluator = dict(
31
+ type='F1Metric',
32
+ mode='macro',
33
+ num_classes=num_classes,
34
+ ignored_classes=[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 25])
35
+ test_evaluator = val_evaluator
configs/kie/sdmgr/_base_sdmgr_unet16.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = '_base_sdmgr_novisual.py'
2
+
3
+ model = dict(
4
+ backbone=dict(type='UNet', base_channels=16),
5
+ roi_extractor=dict(
6
+ type='mmdet.SingleRoIExtractor',
7
+ roi_layer=dict(type='RoIAlign', output_size=7),
8
+ featmap_strides=[1]),
9
+ data_preprocessor=dict(
10
+ type='ImgDataPreprocessor',
11
+ mean=[123.675, 116.28, 103.53],
12
+ std=[58.395, 57.12, 57.375],
13
+ bgr_to_rgb=True,
14
+ pad_size_divisor=32),
15
+ )
16
+
17
+ train_pipeline = [
18
+ dict(type='LoadImageFromFile'),
19
+ dict(type='LoadKIEAnnotations'),
20
+ dict(type='Resize', scale=(1024, 512), keep_ratio=True),
21
+ dict(type='PackKIEInputs')
22
+ ]
23
+ test_pipeline = [
24
+ dict(type='LoadImageFromFile'),
25
+ dict(type='LoadKIEAnnotations'),
26
+ dict(type='Resize', scale=(1024, 512), keep_ratio=True),
27
+ dict(type='PackKIEInputs', meta_keys=('img_path', )),
28
+ ]
configs/kie/sdmgr/metafile.yml ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Collections:
2
+ - Name: SDMGR
3
+ Metadata:
4
+ Training Data: KIEDataset
5
+ Training Techniques:
6
+ - Adam
7
+ Training Resources: 1x NVIDIA A100-SXM4-80GB
8
+ Architecture:
9
+ - UNet
10
+ - SDMGRHead
11
+ Paper:
12
+ URL: https://arxiv.org/abs/2103.14470.pdf
13
+ Title: 'Spatial Dual-Modality Graph Reasoning for Key Information Extraction'
14
+ README: configs/kie/sdmgr/README.md
15
+
16
+ Models:
17
+ - Name: sdmgr_unet16_60e_wildreceipt
18
+ Alias: SDMGR
19
+ In Collection: SDMGR
20
+ Config: configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py
21
+ Metadata:
22
+ Training Data: wildreceipt
23
+ Results:
24
+ - Task: Key Information Extraction
25
+ Dataset: wildreceipt
26
+ Metrics:
27
+ macro_f1: 0.890
28
+ Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth
29
+ - Name: sdmgr_novisual_60e_wildreceipt
30
+ In Collection: SDMGR
31
+ Config: configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py
32
+ Metadata:
33
+ Training Data: wildreceipt
34
+ Results:
35
+ - Task: Key Information Extraction
36
+ Dataset: wildreceipt
37
+ Metrics:
38
+ macro_f1: 0.873
39
+ Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/sdmgr_novisual_60e_wildreceipt_20220831_193317-827649d8.pth
40
+ - Name: sdmgr_novisual_60e_wildreceipt_openset
41
+ In Collection: SDMGR
42
+ Config: configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py
43
+ Metadata:
44
+ Training Data: wildreceipt-openset
45
+ Results:
46
+ - Task: Key Information Extraction
47
+ Dataset: wildreceipt
48
+ Metrics:
49
+ macro_f1: 0.931
50
+ micro_f1: 0.940
51
+ edge_micro_f1: 0.792
52
+ Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/sdmgr_novisual_60e_wildreceipt-openset_20220831_200807-dedf15ec.pth
configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '../_base_/default_runtime.py',
3
+ '../_base_/datasets/wildreceipt-openset.py',
4
+ '../_base_/schedules/schedule_adam_60e.py',
5
+ '_base_sdmgr_novisual.py',
6
+ ]
7
+
8
+ node_num_classes = 4 # 4 classes: bg, key, value and other
9
+ edge_num_classes = 2 # edge connectivity
10
+ key_node_idx = 1
11
+ value_node_idx = 2
12
+
13
+ model = dict(
14
+ type='SDMGR',
15
+ kie_head=dict(
16
+ num_classes=node_num_classes,
17
+ postprocessor=dict(
18
+ link_type='one-to-many',
19
+ key_node_idx=key_node_idx,
20
+ value_node_idx=value_node_idx)),
21
+ )
22
+
23
+ test_pipeline = [
24
+ dict(
25
+ type='LoadKIEAnnotations',
26
+ key_node_idx=key_node_idx,
27
+ value_node_idx=value_node_idx), # Keep key->value edges for evaluation
28
+ dict(type='Resize', scale=(1024, 512), keep_ratio=True),
29
+ dict(type='PackKIEInputs'),
30
+ ]
31
+
32
+ wildreceipt_openset_train = _base_.wildreceipt_openset_train
33
+ wildreceipt_openset_train.pipeline = _base_.train_pipeline
34
+ wildreceipt_openset_test = _base_.wildreceipt_openset_test
35
+ wildreceipt_openset_test.pipeline = test_pipeline
36
+
37
+ train_dataloader = dict(
38
+ batch_size=4,
39
+ num_workers=1,
40
+ persistent_workers=True,
41
+ sampler=dict(type='DefaultSampler', shuffle=True),
42
+ dataset=wildreceipt_openset_train)
43
+ val_dataloader = dict(
44
+ batch_size=1,
45
+ num_workers=1,
46
+ persistent_workers=True,
47
+ sampler=dict(type='DefaultSampler', shuffle=False),
48
+ dataset=wildreceipt_openset_test)
49
+ test_dataloader = val_dataloader
50
+
51
+ val_evaluator = [
52
+ dict(
53
+ type='F1Metric',
54
+ prefix='node',
55
+ key='labels',
56
+ mode=['micro', 'macro'],
57
+ num_classes=node_num_classes,
58
+ cared_classes=[key_node_idx, value_node_idx]),
59
+ dict(
60
+ type='F1Metric',
61
+ prefix='edge',
62
+ mode='micro',
63
+ key='edge_labels',
64
+ cared_classes=[1], # Collapse to binary F1 score
65
+ num_classes=edge_num_classes)
66
+ ]
67
+ test_evaluator = val_evaluator
68
+
69
+ visualizer = dict(
70
+ type='KIELocalVisualizer', name='visualizer', is_openset=True)
71
+ auto_scale_lr = dict(base_batch_size=4)
configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '../_base_/default_runtime.py',
3
+ '../_base_/datasets/wildreceipt.py',
4
+ '../_base_/schedules/schedule_adam_60e.py',
5
+ '_base_sdmgr_novisual.py',
6
+ ]
7
+
8
+ wildreceipt_train = _base_.wildreceipt_train
9
+ wildreceipt_train.pipeline = _base_.train_pipeline
10
+ wildreceipt_test = _base_.wildreceipt_test
11
+ wildreceipt_test.pipeline = _base_.test_pipeline
12
+
13
+ train_dataloader = dict(
14
+ batch_size=4,
15
+ num_workers=1,
16
+ persistent_workers=True,
17
+ sampler=dict(type='DefaultSampler', shuffle=True),
18
+ dataset=wildreceipt_train)
19
+
20
+ val_dataloader = dict(
21
+ batch_size=1,
22
+ num_workers=1,
23
+ persistent_workers=True,
24
+ sampler=dict(type='DefaultSampler', shuffle=False),
25
+ dataset=wildreceipt_test)
26
+ test_dataloader = val_dataloader
27
+
28
+ auto_scale_lr = dict(base_batch_size=4)
configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '../_base_/default_runtime.py',
3
+ '../_base_/datasets/wildreceipt.py',
4
+ '../_base_/schedules/schedule_adam_60e.py',
5
+ '_base_sdmgr_unet16.py',
6
+ ]
7
+
8
+ wildreceipt_train = _base_.wildreceipt_train
9
+ wildreceipt_train.pipeline = _base_.train_pipeline
10
+ wildreceipt_test = _base_.wildreceipt_test
11
+ wildreceipt_test.pipeline = _base_.test_pipeline
12
+
13
+ train_dataloader = dict(
14
+ batch_size=4,
15
+ num_workers=4,
16
+ persistent_workers=True,
17
+ sampler=dict(type='DefaultSampler', shuffle=True),
18
+ dataset=wildreceipt_train)
19
+
20
+ val_dataloader = dict(
21
+ batch_size=1,
22
+ num_workers=1,
23
+ persistent_workers=True,
24
+ sampler=dict(type='DefaultSampler', shuffle=False),
25
+ dataset=wildreceipt_test)
26
+
27
+ test_dataloader = val_dataloader
28
+
29
+ auto_scale_lr = dict(base_batch_size=4)
configs/textdet/_base_/datasets/ctw1500.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ctw1500_textdet_data_root = 'data/ctw1500'
2
+
3
+ ctw1500_textdet_train = dict(
4
+ type='OCRDataset',
5
+ data_root=ctw1500_textdet_data_root,
6
+ ann_file='textdet_train.json',
7
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
8
+ pipeline=None)
9
+
10
+ ctw1500_textdet_test = dict(
11
+ type='OCRDataset',
12
+ data_root=ctw1500_textdet_data_root,
13
+ ann_file='textdet_test.json',
14
+ test_mode=True,
15
+ pipeline=None)
configs/textdet/_base_/datasets/icdar2015.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ icdar2015_textdet_data_root = 'data/icdar2015'
2
+
3
+ icdar2015_textdet_train = dict(
4
+ type='OCRDataset',
5
+ data_root=icdar2015_textdet_data_root,
6
+ ann_file='textdet_train.json',
7
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
8
+ pipeline=None)
9
+
10
+ icdar2015_textdet_test = dict(
11
+ type='OCRDataset',
12
+ data_root=icdar2015_textdet_data_root,
13
+ ann_file='textdet_test.json',
14
+ test_mode=True,
15
+ pipeline=None)
configs/textdet/_base_/datasets/icdar2017.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ icdar2017_textdet_data_root = 'data/det/icdar_2017'
2
+
3
+ icdar2017_textdet_train = dict(
4
+ type='OCRDataset',
5
+ data_root=icdar2017_textdet_data_root,
6
+ ann_file='instances_training.json',
7
+ data_prefix=dict(img_path='imgs/'),
8
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
9
+ pipeline=None)
10
+
11
+ icdar2017_textdet_test = dict(
12
+ type='OCRDataset',
13
+ data_root=icdar2017_textdet_data_root,
14
+ ann_file='instances_test.json',
15
+ data_prefix=dict(img_path='imgs/'),
16
+ test_mode=True,
17
+ pipeline=None)
configs/textdet/_base_/datasets/synthtext.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ synthtext_textdet_data_root = 'data/synthtext'
2
+
3
+ synthtext_textdet_train = dict(
4
+ type='OCRDataset',
5
+ data_root=synthtext_textdet_data_root,
6
+ ann_file='textdet_train.json',
7
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
8
+ pipeline=None)
configs/textdet/_base_/datasets/totaltext.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ totaltext_textdet_data_root = 'data/totaltext'
2
+
3
+ totaltext_textdet_train = dict(
4
+ type='OCRDataset',
5
+ data_root=totaltext_textdet_data_root,
6
+ ann_file='textdet_train.json',
7
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
8
+ pipeline=None)
9
+
10
+ totaltext_textdet_test = dict(
11
+ type='OCRDataset',
12
+ data_root=totaltext_textdet_data_root,
13
+ ann_file='textdet_test.json',
14
+ test_mode=True,
15
+ pipeline=None)
configs/textdet/_base_/datasets/toy_data.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ toy_det_data_root = 'tests/data/det_toy_dataset'
2
+
3
+ toy_det_train = dict(
4
+ type='OCRDataset',
5
+ data_root=toy_det_data_root,
6
+ ann_file='instances_training.json',
7
+ data_prefix=dict(img_path='imgs/'),
8
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
9
+ pipeline=None)
10
+
11
+ toy_det_test = dict(
12
+ type='OCRDataset',
13
+ data_root=toy_det_data_root,
14
+ ann_file='instances_test.json',
15
+ data_prefix=dict(img_path='imgs/'),
16
+ test_mode=True,
17
+ pipeline=None)
configs/textdet/_base_/default_runtime.py ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ default_scope = 'mmocr'
2
+ env_cfg = dict(
3
+ cudnn_benchmark=False,
4
+ mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
5
+ dist_cfg=dict(backend='nccl'),
6
+ )
7
+ randomness = dict(seed=None)
8
+
9
+ default_hooks = dict(
10
+ timer=dict(type='IterTimerHook'),
11
+ logger=dict(type='LoggerHook', interval=5),
12
+ param_scheduler=dict(type='ParamSchedulerHook'),
13
+ checkpoint=dict(type='CheckpointHook', interval=20),
14
+ sampler_seed=dict(type='DistSamplerSeedHook'),
15
+ sync_buffer=dict(type='SyncBuffersHook'),
16
+ visualization=dict(
17
+ type='VisualizationHook',
18
+ interval=1,
19
+ enable=False,
20
+ show=False,
21
+ draw_gt=False,
22
+ draw_pred=False),
23
+ )
24
+
25
+ # Logging
26
+ log_level = 'INFO'
27
+ log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
28
+
29
+ load_from = None
30
+ resume = False
31
+
32
+ # Evaluation
33
+ val_evaluator = dict(type='HmeanIOUMetric')
34
+ test_evaluator = val_evaluator
35
+
36
+ # Visualization
37
+ vis_backends = [dict(type='LocalVisBackend')]
38
+ visualizer = dict(
39
+ type='TextDetLocalVisualizer',
40
+ name='visualizer',
41
+ vis_backends=vis_backends)
configs/textdet/_base_/pretrain_runtime.py ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = 'default_runtime.py'
2
+
3
+ default_hooks = dict(
4
+ logger=dict(type='LoggerHook', interval=1000),
5
+ checkpoint=dict(
6
+ type='CheckpointHook',
7
+ interval=10000,
8
+ by_epoch=False,
9
+ max_keep_ckpts=1),
10
+ )
11
+
12
+ # Evaluation
13
+ val_evaluator = None
14
+ test_evaluator = None
configs/textdet/_base_/schedules/schedule_adam_600e.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # optimizer
2
+ optim_wrapper = dict(type='OptimWrapper', optimizer=dict(type='Adam', lr=1e-3))
3
+ train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=600, val_interval=20)
4
+ val_cfg = dict(type='ValLoop')
5
+ test_cfg = dict(type='TestLoop')
6
+ # learning rate
7
+ param_scheduler = [
8
+ dict(type='PolyLR', power=0.9, end=600),
9
+ ]
configs/textdet/_base_/schedules/schedule_sgd_100k.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # optimizer
2
+ optim_wrapper = dict(
3
+ type='OptimWrapper',
4
+ optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
5
+
6
+ train_cfg = dict(type='IterBasedTrainLoop', max_iters=100000)
7
+ test_cfg = None
8
+ val_cfg = None
9
+ # learning policy
10
+ param_scheduler = [
11
+ dict(type='PolyLR', power=0.9, eta_min=1e-7, by_epoch=False, end=100000),
12
+ ]
configs/textdet/_base_/schedules/schedule_sgd_1200e.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # optimizer
2
+ optim_wrapper = dict(
3
+ type='OptimWrapper',
4
+ optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
5
+ train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=1200, val_interval=20)
6
+ val_cfg = dict(type='ValLoop')
7
+ test_cfg = dict(type='TestLoop')
8
+ # learning policy
9
+ param_scheduler = [
10
+ dict(type='PolyLR', power=0.9, eta_min=1e-7, end=1200),
11
+ ]
configs/textdet/_base_/schedules/schedule_sgd_base.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Note: This schedule config serves as a base config for other schedules.
2
+ # Users would have to at least fill in "max_epochs" and "val_interval"
3
+ # in order to use this config in their experiments.
4
+
5
+ # optimizer
6
+ optim_wrapper = dict(
7
+ type='OptimWrapper',
8
+ optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
9
+ train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=None, val_interval=20)
10
+ val_cfg = dict(type='ValLoop')
11
+ test_cfg = dict(type='TestLoop')
12
+ # learning policy
13
+ param_scheduler = [
14
+ dict(type='ConstantLR', factor=1.0),
15
+ ]
configs/textdet/dbnet/README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DBNet
2
+
3
+ > [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947)
4
+
5
+ <!-- [ALGORITHM] -->
6
+
7
+ ## Abstract
8
+
9
+ Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset.
10
+
11
+ <div align=center>
12
+ <img src="https://user-images.githubusercontent.com/22607038/142791306-0da6db2a-20a6-4a68-b228-64ff275f67b3.png"/>
13
+ </div>
14
+
15
+ ## Results and models
16
+
17
+ ### SynthText
18
+
19
+ | Method | Backbone | Training set | #iters | Download |
20
+ | :-----------------------------------------------------------------------: | :------: | :----------: | :-----: | :--------------------------------------------------------------------------------------------------: |
21
+ | [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext.py) | ResNet18 | SynthText | 100,000 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext/dbnet_resnet18_fpnc_100k_synthtext-2e9bf392.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext/20221214_150351.log) |
22
+
23
+ ### ICDAR2015
24
+
25
+ | Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
26
+ | :----------------------------: | :------------------------------: | :--------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------: |
27
+ | [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py) | ResNet18 | - | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.8853 | 0.7583 | 0.8169 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/20220825_221614.log) |
28
+ | [DBNet_r50](/configs/textdet/dbnet/dbnet_resnet50_1200e_icdar2015.py) | ResNet50 | - | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.8744 | 0.8276 | 0.8504 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50_1200e_icdar2015/dbnet_resnet50_1200e_icdar2015_20221102_115917-54f50589.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50_1200e_icdar2015/20221102_115917.log) |
29
+ | [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | ResNet50-DCN | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/tmp_1.0_pretrain/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-ed322016.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.8784 | 0.8315 | 0.8543 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015_20220828_124917-452c443c.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/20220828_124917.log) |
30
+ | [DBNet_r50-oclip](/configs/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015.py) | [ResNet50-oCLIP](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) | - | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9052 | 0.8272 | 0.8644 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015/dbnet_resnet50-oclip_1200e_icdar2015_20221102_115917-bde8c87a.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015/20221102_115917.log) |
31
+
32
+ ### Total Text
33
+
34
+ | Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
35
+ | :----------------------------------------------------: | :------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------: |
36
+ | [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext.py) | ResNet18 | - | Totaltext Train | Totaltext Test | 1200 | 736 | 0.8640 | 0.7770 | 0.8182 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext/dbnet_resnet18_fpnc_1200e_totaltext-3ed3233c.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext/20221219_201038.log) |
37
+
38
+ ## Citation
39
+
40
+ ```bibtex
41
+ @article{Liao_Wan_Yao_Chen_Bai_2020,
42
+ title={Real-Time Scene Text Detection with Differentiable Binarization},
43
+ journal={Proceedings of the AAAI Conference on Artificial Intelligence},
44
+ author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
45
+ year={2020},
46
+ pages={11474-11481}}
47
+ ```
configs/textdet/dbnet/_base_dbnet_resnet18_fpnc.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model = dict(
2
+ type='DBNet',
3
+ backbone=dict(
4
+ type='mmdet.ResNet',
5
+ depth=18,
6
+ num_stages=4,
7
+ out_indices=(0, 1, 2, 3),
8
+ frozen_stages=-1,
9
+ norm_cfg=dict(type='BN', requires_grad=True),
10
+ init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18'),
11
+ norm_eval=False,
12
+ style='caffe'),
13
+ neck=dict(
14
+ type='FPNC', in_channels=[64, 128, 256, 512], lateral_channels=256),
15
+ det_head=dict(
16
+ type='DBHead',
17
+ in_channels=256,
18
+ module_loss=dict(type='DBModuleLoss'),
19
+ postprocessor=dict(type='DBPostprocessor', text_repr_type='quad')),
20
+ data_preprocessor=dict(
21
+ type='TextDetDataPreprocessor',
22
+ mean=[123.675, 116.28, 103.53],
23
+ std=[58.395, 57.12, 57.375],
24
+ bgr_to_rgb=True,
25
+ pad_size_divisor=32))
26
+
27
+ train_pipeline = [
28
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
29
+ dict(
30
+ type='LoadOCRAnnotations',
31
+ with_polygon=True,
32
+ with_bbox=True,
33
+ with_label=True,
34
+ ),
35
+ dict(
36
+ type='TorchVisionWrapper',
37
+ op='ColorJitter',
38
+ brightness=32.0 / 255,
39
+ saturation=0.5),
40
+ dict(
41
+ type='ImgAugWrapper',
42
+ args=[['Fliplr', 0.5],
43
+ dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
44
+ dict(type='RandomCrop', min_side_ratio=0.1),
45
+ dict(type='Resize', scale=(640, 640), keep_ratio=True),
46
+ dict(type='Pad', size=(640, 640)),
47
+ dict(
48
+ type='PackTextDetInputs',
49
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
50
+ ]
51
+
52
+ test_pipeline = [
53
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
54
+ dict(type='Resize', scale=(1333, 736), keep_ratio=True),
55
+ dict(
56
+ type='LoadOCRAnnotations',
57
+ with_polygon=True,
58
+ with_bbox=True,
59
+ with_label=True,
60
+ ),
61
+ dict(
62
+ type='PackTextDetInputs',
63
+ meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
64
+ ]
configs/textdet/dbnet/_base_dbnet_resnet50-dcnv2_fpnc.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model = dict(
2
+ type='DBNet',
3
+ backbone=dict(
4
+ type='mmdet.ResNet',
5
+ depth=50,
6
+ num_stages=4,
7
+ out_indices=(0, 1, 2, 3),
8
+ frozen_stages=-1,
9
+ norm_cfg=dict(type='BN', requires_grad=True),
10
+ norm_eval=False,
11
+ style='pytorch',
12
+ dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
13
+ init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
14
+ stage_with_dcn=(False, True, True, True)),
15
+ neck=dict(
16
+ type='FPNC', in_channels=[256, 512, 1024, 2048], lateral_channels=256),
17
+ det_head=dict(
18
+ type='DBHead',
19
+ in_channels=256,
20
+ module_loss=dict(type='DBModuleLoss'),
21
+ postprocessor=dict(type='DBPostprocessor', text_repr_type='quad')),
22
+ data_preprocessor=dict(
23
+ type='TextDetDataPreprocessor',
24
+ mean=[123.675, 116.28, 103.53],
25
+ std=[58.395, 57.12, 57.375],
26
+ bgr_to_rgb=True,
27
+ pad_size_divisor=32))
28
+
29
+ train_pipeline = [
30
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
31
+ dict(
32
+ type='LoadOCRAnnotations',
33
+ with_bbox=True,
34
+ with_polygon=True,
35
+ with_label=True,
36
+ ),
37
+ dict(
38
+ type='TorchVisionWrapper',
39
+ op='ColorJitter',
40
+ brightness=32.0 / 255,
41
+ saturation=0.5),
42
+ dict(
43
+ type='ImgAugWrapper',
44
+ args=[['Fliplr', 0.5],
45
+ dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
46
+ dict(type='RandomCrop', min_side_ratio=0.1),
47
+ dict(type='Resize', scale=(640, 640), keep_ratio=True),
48
+ dict(type='Pad', size=(640, 640)),
49
+ dict(
50
+ type='PackTextDetInputs',
51
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
52
+ ]
53
+
54
+ test_pipeline = [
55
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
56
+ dict(type='Resize', scale=(4068, 1024), keep_ratio=True),
57
+ dict(
58
+ type='LoadOCRAnnotations',
59
+ with_polygon=True,
60
+ with_bbox=True,
61
+ with_label=True,
62
+ ),
63
+ dict(
64
+ type='PackTextDetInputs',
65
+ meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
66
+ ]
configs/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '_base_dbnet_resnet18_fpnc.py',
3
+ '../_base_/datasets/synthtext.py',
4
+ '../_base_/pretrain_runtime.py',
5
+ '../_base_/schedules/schedule_sgd_100k.py',
6
+ ]
7
+
8
+ train_pipeline = [
9
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
10
+ dict(
11
+ type='LoadOCRAnnotations',
12
+ with_polygon=True,
13
+ with_bbox=True,
14
+ with_label=True,
15
+ ),
16
+ dict(type='FixInvalidPolygon'),
17
+ dict(
18
+ type='TorchVisionWrapper',
19
+ op='ColorJitter',
20
+ brightness=32.0 / 255,
21
+ saturation=0.5),
22
+ dict(
23
+ type='ImgAugWrapper',
24
+ args=[['Fliplr', 0.5],
25
+ dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
26
+ dict(type='RandomCrop', min_side_ratio=0.1),
27
+ dict(type='Resize', scale=(640, 640), keep_ratio=True),
28
+ dict(type='Pad', size=(640, 640)),
29
+ dict(
30
+ type='PackTextDetInputs',
31
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
32
+ ]
33
+
34
+ # dataset settings
35
+ synthtext_textdet_train = _base_.synthtext_textdet_train
36
+ synthtext_textdet_train.pipeline = train_pipeline
37
+
38
+ train_dataloader = dict(
39
+ batch_size=16,
40
+ num_workers=8,
41
+ persistent_workers=True,
42
+ sampler=dict(type='DefaultSampler', shuffle=True),
43
+ dataset=synthtext_textdet_train)
44
+
45
+ auto_scale_lr = dict(base_batch_size=16)
configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '_base_dbnet_resnet18_fpnc.py',
3
+ '../_base_/datasets/icdar2015.py',
4
+ '../_base_/default_runtime.py',
5
+ '../_base_/schedules/schedule_sgd_1200e.py',
6
+ ]
7
+
8
+ # dataset settings
9
+ icdar2015_textdet_train = _base_.icdar2015_textdet_train
10
+ icdar2015_textdet_train.pipeline = _base_.train_pipeline
11
+ icdar2015_textdet_test = _base_.icdar2015_textdet_test
12
+ icdar2015_textdet_test.pipeline = _base_.test_pipeline
13
+
14
+ train_dataloader = dict(
15
+ batch_size=16,
16
+ num_workers=8,
17
+ persistent_workers=True,
18
+ sampler=dict(type='DefaultSampler', shuffle=True),
19
+ dataset=icdar2015_textdet_train)
20
+
21
+ val_dataloader = dict(
22
+ batch_size=1,
23
+ num_workers=4,
24
+ persistent_workers=True,
25
+ sampler=dict(type='DefaultSampler', shuffle=False),
26
+ dataset=icdar2015_textdet_test)
27
+
28
+ test_dataloader = val_dataloader
29
+
30
+ auto_scale_lr = dict(base_batch_size=16)
configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext.py ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '_base_dbnet_resnet18_fpnc.py',
3
+ '../_base_/datasets/totaltext.py',
4
+ '../_base_/default_runtime.py',
5
+ '../_base_/schedules/schedule_sgd_1200e.py',
6
+ ]
7
+
8
+ train_pipeline = [
9
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
10
+ dict(
11
+ type='LoadOCRAnnotations',
12
+ with_polygon=True,
13
+ with_bbox=True,
14
+ with_label=True,
15
+ ),
16
+ dict(type='FixInvalidPolygon', min_poly_points=4),
17
+ dict(
18
+ type='TorchVisionWrapper',
19
+ op='ColorJitter',
20
+ brightness=32.0 / 255,
21
+ saturation=0.5),
22
+ dict(
23
+ type='ImgAugWrapper',
24
+ args=[['Fliplr', 0.5],
25
+ dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
26
+ dict(type='RandomCrop', min_side_ratio=0.1),
27
+ dict(type='Resize', scale=(640, 640), keep_ratio=True),
28
+ dict(type='Pad', size=(640, 640)),
29
+ dict(
30
+ type='PackTextDetInputs',
31
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
32
+ ]
33
+
34
+ test_pipeline = [
35
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
36
+ dict(type='Resize', scale=(1333, 736), keep_ratio=True),
37
+ dict(
38
+ type='LoadOCRAnnotations',
39
+ with_polygon=True,
40
+ with_bbox=True,
41
+ with_label=True,
42
+ ),
43
+ dict(type='FixInvalidPolygon', min_poly_points=4),
44
+ dict(
45
+ type='PackTextDetInputs',
46
+ meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
47
+ ]
48
+
49
+ # dataset settings
50
+ totaltext_textdet_train = _base_.totaltext_textdet_train
51
+ totaltext_textdet_test = _base_.totaltext_textdet_test
52
+ totaltext_textdet_train.pipeline = train_pipeline
53
+ totaltext_textdet_test.pipeline = test_pipeline
54
+
55
+ train_dataloader = dict(
56
+ batch_size=16,
57
+ num_workers=16,
58
+ pin_memory=True,
59
+ persistent_workers=True,
60
+ sampler=dict(type='DefaultSampler', shuffle=True),
61
+ dataset=totaltext_textdet_train)
62
+
63
+ val_dataloader = dict(
64
+ batch_size=1,
65
+ num_workers=1,
66
+ pin_memory=True,
67
+ persistent_workers=True,
68
+ sampler=dict(type='DefaultSampler', shuffle=False),
69
+ dataset=totaltext_textdet_test)
70
+
71
+ test_dataloader = val_dataloader
72
+
73
+ auto_scale_lr = dict(base_batch_size=16)
configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_100k_synthtext.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '_base_dbnet_resnet50-dcnv2_fpnc.py',
3
+ '../_base_/default_runtime.py',
4
+ '../_base_/datasets/synthtext.py',
5
+ '../_base_/schedules/schedule_sgd_100k.py',
6
+ ]
7
+
8
+ # dataset settings
9
+ synthtext_textdet_train = _base_.synthtext_textdet_train
10
+ synthtext_textdet_train.pipeline = _base_.train_pipeline
11
+ synthtext_textdet_test = _base_.synthtext_textdet_test
12
+ synthtext_textdet_test.pipeline = _base_.test_pipeline
13
+
14
+ train_dataloader = dict(
15
+ batch_size=16,
16
+ num_workers=8,
17
+ persistent_workers=True,
18
+ sampler=dict(type='DefaultSampler', shuffle=True),
19
+ dataset=synthtext_textdet_train)
20
+
21
+ val_dataloader = dict(
22
+ batch_size=1,
23
+ num_workers=4,
24
+ persistent_workers=True,
25
+ sampler=dict(type='DefaultSampler', shuffle=False),
26
+ dataset=synthtext_textdet_test)
27
+
28
+ test_dataloader = val_dataloader
29
+
30
+ auto_scale_lr = dict(base_batch_size=16)
configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '_base_dbnet_resnet50-dcnv2_fpnc.py',
3
+ '../_base_/datasets/icdar2015.py',
4
+ '../_base_/default_runtime.py',
5
+ '../_base_/schedules/schedule_sgd_1200e.py',
6
+ ]
7
+
8
+ # TODO: Replace the link
9
+ load_from = 'https://download.openmmlab.com/mmocr/textdet/dbnet/tmp_1.0_pretrain/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-ed322016.pth' # noqa
10
+
11
+ # dataset settings
12
+ icdar2015_textdet_train = _base_.icdar2015_textdet_train
13
+ icdar2015_textdet_train.pipeline = _base_.train_pipeline
14
+ icdar2015_textdet_test = _base_.icdar2015_textdet_test
15
+ icdar2015_textdet_test.pipeline = _base_.test_pipeline
16
+
17
+ train_dataloader = dict(
18
+ batch_size=16,
19
+ num_workers=8,
20
+ persistent_workers=True,
21
+ sampler=dict(type='DefaultSampler', shuffle=True),
22
+ dataset=icdar2015_textdet_train)
23
+
24
+ val_dataloader = dict(
25
+ batch_size=1,
26
+ num_workers=4,
27
+ persistent_workers=True,
28
+ sampler=dict(type='DefaultSampler', shuffle=False),
29
+ dataset=icdar2015_textdet_test)
30
+
31
+ test_dataloader = val_dataloader
32
+
33
+ auto_scale_lr = dict(base_batch_size=16)
configs/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ 'dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py',
3
+ ]
4
+
5
+ load_from = None
6
+
7
+ _base_.model.backbone = dict(
8
+ type='CLIPResNet',
9
+ init_cfg=dict(
10
+ type='Pretrained',
11
+ checkpoint='https://download.openmmlab.com/'
12
+ 'mmocr/backbone/resnet50-oclip-7ba0c533.pth'))
13
+
14
+ _base_.train_dataloader.num_workers = 24
15
+ _base_.optim_wrapper.optimizer.lr = 0.002
16
+
17
+ param_scheduler = [
18
+ dict(type='LinearLR', end=100, start_factor=0.001),
19
+ dict(type='PolyLR', power=0.9, eta_min=1e-7, begin=100, end=1200),
20
+ ]
configs/textdet/dbnet/dbnet_resnet50_1200e_icdar2015.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ 'dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py',
3
+ ]
4
+
5
+ load_from = None
6
+
7
+ _base_.model.backbone = dict(
8
+ type='mmdet.ResNet',
9
+ depth=50,
10
+ num_stages=4,
11
+ out_indices=(0, 1, 2, 3),
12
+ frozen_stages=-1,
13
+ norm_cfg=dict(type='BN', requires_grad=True),
14
+ norm_eval=True,
15
+ style='pytorch',
16
+ init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'))
17
+
18
+ _base_.train_dataloader.num_workers = 24
19
+ _base_.optim_wrapper.optimizer.lr = 0.002
20
+
21
+ param_scheduler = [
22
+ dict(type='LinearLR', end=100, start_factor=0.001),
23
+ dict(type='PolyLR', power=0.9, eta_min=1e-7, begin=100, end=1200),
24
+ ]
configs/textdet/dbnet/metafile.yml ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Collections:
2
+ - Name: DBNet
3
+ Metadata:
4
+ Training Data: ICDAR2015
5
+ Training Techniques:
6
+ - SGD with Momentum
7
+ - Weight Decay
8
+ Training Resources: 1x NVIDIA A100-SXM4-80GB
9
+ Architecture:
10
+ - ResNet
11
+ - FPNC
12
+ Paper:
13
+ URL: https://arxiv.org/pdf/1911.08947.pdf
14
+ Title: 'Real-time Scene Text Detection with Differentiable Binarization'
15
+ README: configs/textdet/dbnet/README.md
16
+
17
+ Models:
18
+ - Name: dbnet_resnet18_fpnc_1200e_icdar2015
19
+ Alias: DB_r18
20
+ In Collection: DBNet
21
+ Config: configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py
22
+ Metadata:
23
+ Training Data: ICDAR2015
24
+ Results:
25
+ - Task: Text Detection
26
+ Dataset: ICDAR2015
27
+ Metrics:
28
+ hmean-iou: 0.8169
29
+ Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth
30
+
31
+ - Name: dbnet_resnet50_fpnc_1200e_icdar2015
32
+ In Collection: DBNet
33
+ Config: configs/textdet/dbnet/dbnet_resnet50_fpnc_1200e_icdar2015.py
34
+ Metadata:
35
+ Training Data: ICDAR2015
36
+ Results:
37
+ - Task: Text Detection
38
+ Dataset: ICDAR2015
39
+ Metrics:
40
+ hmean-iou: 0.8504
41
+ Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50_1200e_icdar2015/dbnet_resnet50_1200e_icdar2015_20221102_115917-54f50589.pth
42
+
43
+ - Name: dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015
44
+ In Collection: DBNet
45
+ Config: configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py
46
+ Metadata:
47
+ Training Data: ICDAR2015
48
+ Results:
49
+ - Task: Text Detection
50
+ Dataset: ICDAR2015
51
+ Metrics:
52
+ hmean-iou: 0.8543
53
+ Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015_20220828_124917-452c443c.pth
54
+
55
+ - Name: dbnet_resnet50-oclip_fpnc_1200e_icdar2015
56
+ In Collection: DBNet
57
+ Alias:
58
+ - DB_r50
59
+ - DBNet
60
+ Config: configs/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015.py
61
+ Metadata:
62
+ Training Data: ICDAR2015
63
+ Results:
64
+ - Task: Text Detection
65
+ Dataset: ICDAR2015
66
+ Metrics:
67
+ hmean-iou: 0.8644
68
+ Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-oclip_1200e_icdar2015/dbnet_resnet50-oclip_1200e_icdar2015_20221102_115917-bde8c87a.pth
69
+
70
+ - Name: dbnet_resnet18_fpnc_1200e_totaltext
71
+ In Collection: DBNet
72
+ Config: configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext.py
73
+ Metadata:
74
+ Training Data: Totaltext
75
+ Results:
76
+ - Task: Text Detection
77
+ Dataset: Totaltext
78
+ Metrics:
79
+ hmean-iou: 0.8182
80
+ Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext/dbnet_resnet18_fpnc_1200e_totaltext-3ed3233c.pth
configs/textdet/dbnetpp/README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DBNetpp
2
+
3
+ > [Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304)
4
+
5
+ <!-- [ALGORITHM] -->
6
+
7
+ ## Abstract
8
+
9
+ Recently, segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field, because of their superiority in detecting the text instances of arbitrary shapes and extreme aspect ratios, profiting from the pixel-level descriptions. However, the vast majority of the existing segmentation-based approaches are limited to their complex post-processing algorithms and the scale robustness of their segmentation models, where the post-processing algorithms are not only isolated to the model optimization but also time-consuming and the scale robustness is usually strengthened by fusing multi-scale feature maps directly. In this paper, we propose a Differentiable Binarization (DB) module that integrates the binarization process, one of the most important steps in the post-processing procedure, into a segmentation network. Optimized along with the proposed DB module, the segmentation network can produce more accurate results, which enhances the accuracy of text detection with a simple pipeline. Furthermore, an efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively. By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.
10
+
11
+ <div align=center>
12
+ <img src="https://user-images.githubusercontent.com/45810070/166850828-f1e48c25-4a0f-429d-ae54-6997ed25c062.png"/>
13
+ </div>
14
+
15
+ ## Results and models
16
+
17
+ ### SynthText
18
+
19
+ | Method | BackBone | Training set | #iters | Download |
20
+ | :--------------------------------------------------------------------------------: | :------------: | :----------: | :-----: | :-----------------------------------------------------------------------------------: |
21
+ | [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py) | ResNet50-dcnv2 | SynthText | 100,000 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext-00f0a80b.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext/20221215_013531.log) |
22
+
23
+ ### ICDAR2015
24
+
25
+ | Method | BackBone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
26
+ | :----------------------------: | :------------------------------: | :--------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------: |
27
+ | [DBNetpp_r50](/configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015.py) | ResNet50 | - | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9079 | 0.8209 | 0.8622 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015/dbnetpp_resnet50_fpnc_1200e_icdar2015_20221025_185550-013730aa.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015/20221025_185550.log) |
28
+ | [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | ResNet50-dcnv2 | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/tmp_1.0_pretrain/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-352fec8a.pth)) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9116 | 0.8291 | 0.8684 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015_20220829_230108-f289bd20.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/20220829_230108.log) |
29
+ | [DBNetpp_r50-oclip](/configs/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015.py) | [ResNet50-oCLIP](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) | - | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9174 | 0.8609 | 0.8882 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015_20221101_124139-4ecb39ac.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015/20221101_124139.log) |
30
+
31
+ ## Citation
32
+
33
+ ```bibtex
34
+ @article{liao2022real,
35
+ title={Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion},
36
+ author={Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang},
37
+ journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
38
+ year={2022},
39
+ publisher={IEEE}
40
+ }
41
+ ```
configs/textdet/dbnetpp/_base_dbnetpp_resnet50-dcnv2_fpnc.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model = dict(
2
+ type='DBNet',
3
+ backbone=dict(
4
+ type='mmdet.ResNet',
5
+ depth=50,
6
+ num_stages=4,
7
+ out_indices=(0, 1, 2, 3),
8
+ frozen_stages=-1,
9
+ norm_cfg=dict(type='BN', requires_grad=True),
10
+ norm_eval=False,
11
+ style='pytorch',
12
+ dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
13
+ init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
14
+ stage_with_dcn=(False, True, True, True)),
15
+ neck=dict(
16
+ type='FPNC',
17
+ in_channels=[256, 512, 1024, 2048],
18
+ lateral_channels=256,
19
+ asf_cfg=dict(attention_type='ScaleChannelSpatial')),
20
+ det_head=dict(
21
+ type='DBHead',
22
+ in_channels=256,
23
+ module_loss=dict(type='DBModuleLoss'),
24
+ postprocessor=dict(
25
+ type='DBPostprocessor', text_repr_type='quad',
26
+ epsilon_ratio=0.002)),
27
+ data_preprocessor=dict(
28
+ type='TextDetDataPreprocessor',
29
+ mean=[123.675, 116.28, 103.53],
30
+ std=[58.395, 57.12, 57.375],
31
+ bgr_to_rgb=True,
32
+ pad_size_divisor=32))
33
+
34
+ train_pipeline = [
35
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
36
+ dict(
37
+ type='LoadOCRAnnotations',
38
+ with_bbox=True,
39
+ with_polygon=True,
40
+ with_label=True,
41
+ ),
42
+ dict(
43
+ type='TorchVisionWrapper',
44
+ op='ColorJitter',
45
+ brightness=32.0 / 255,
46
+ saturation=0.5),
47
+ dict(
48
+ type='ImgAugWrapper',
49
+ args=[['Fliplr', 0.5],
50
+ dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
51
+ dict(type='RandomCrop', min_side_ratio=0.1),
52
+ dict(type='Resize', scale=(640, 640), keep_ratio=True),
53
+ dict(type='Pad', size=(640, 640)),
54
+ dict(
55
+ type='PackTextDetInputs',
56
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
57
+ ]
58
+
59
+ test_pipeline = [
60
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
61
+ dict(type='Resize', scale=(4068, 1024), keep_ratio=True),
62
+ dict(
63
+ type='LoadOCRAnnotations',
64
+ with_polygon=True,
65
+ with_bbox=True,
66
+ with_label=True,
67
+ ),
68
+ dict(
69
+ type='PackTextDetInputs',
70
+ meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor',
71
+ 'instances'))
72
+ ]
configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '_base_dbnetpp_resnet50-dcnv2_fpnc.py',
3
+ '../_base_/pretrain_runtime.py',
4
+ '../_base_/datasets/synthtext.py',
5
+ '../_base_/schedules/schedule_sgd_100k.py',
6
+ ]
7
+
8
+ train_pipeline = [
9
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
10
+ dict(
11
+ type='LoadOCRAnnotations',
12
+ with_bbox=True,
13
+ with_polygon=True,
14
+ with_label=True,
15
+ ),
16
+ dict(type='FixInvalidPolygon'),
17
+ dict(
18
+ type='TorchVisionWrapper',
19
+ op='ColorJitter',
20
+ brightness=32.0 / 255,
21
+ saturation=0.5),
22
+ dict(
23
+ type='ImgAugWrapper',
24
+ args=[['Fliplr', 0.5],
25
+ dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
26
+ dict(type='RandomCrop', min_side_ratio=0.1),
27
+ dict(type='Resize', scale=(640, 640), keep_ratio=True),
28
+ dict(type='Pad', size=(640, 640)),
29
+ dict(
30
+ type='PackTextDetInputs',
31
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
32
+ ]
33
+
34
+ synthtext_textdet_train = _base_.synthtext_textdet_train
35
+ synthtext_textdet_train.pipeline = train_pipeline
36
+
37
+ train_dataloader = dict(
38
+ batch_size=16,
39
+ num_workers=8,
40
+ persistent_workers=True,
41
+ sampler=dict(type='DefaultSampler', shuffle=True),
42
+ dataset=synthtext_textdet_train)
43
+
44
+ auto_scale_lr = dict(base_batch_size=16)
configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '_base_dbnetpp_resnet50-dcnv2_fpnc.py',
3
+ '../_base_/default_runtime.py',
4
+ '../_base_/datasets/icdar2015.py',
5
+ '../_base_/schedules/schedule_sgd_1200e.py',
6
+ ]
7
+
8
+ load_from = 'https://download.openmmlab.com/mmocr/textdet/dbnetpp/tmp_1.0_pretrain/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-352fec8a.pth' # noqa
9
+
10
+ # dataset settings
11
+ train_list = [_base_.icdar2015_textdet_train]
12
+ test_list = [_base_.icdar2015_textdet_test]
13
+
14
+ train_dataloader = dict(
15
+ batch_size=16,
16
+ num_workers=8,
17
+ persistent_workers=True,
18
+ sampler=dict(type='DefaultSampler', shuffle=True),
19
+ dataset=dict(
20
+ type='ConcatDataset',
21
+ datasets=train_list,
22
+ pipeline=_base_.train_pipeline))
23
+
24
+ val_dataloader = dict(
25
+ batch_size=16,
26
+ num_workers=8,
27
+ persistent_workers=True,
28
+ sampler=dict(type='DefaultSampler', shuffle=False),
29
+ dataset=dict(
30
+ type='ConcatDataset',
31
+ datasets=test_list,
32
+ pipeline=_base_.test_pipeline))
33
+
34
+ test_dataloader = val_dataloader
35
+
36
+ auto_scale_lr = dict(base_batch_size=16)
configs/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ 'dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py',
3
+ ]
4
+
5
+ load_from = None
6
+
7
+ _base_.model.backbone = dict(
8
+ type='CLIPResNet',
9
+ init_cfg=dict(
10
+ type='Pretrained',
11
+ checkpoint='https://download.openmmlab.com/'
12
+ 'mmocr/backbone/resnet50-oclip-7ba0c533.pth'))
13
+
14
+ _base_.train_dataloader.num_workers = 24
15
+ _base_.optim_wrapper.optimizer.lr = 0.002
16
+
17
+ param_scheduler = [
18
+ dict(type='LinearLR', end=200, start_factor=0.001),
19
+ dict(type='PolyLR', power=0.9, eta_min=1e-7, begin=200, end=1200),
20
+ ]
configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ 'dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py',
3
+ ]
4
+
5
+ load_from = None
6
+
7
+ _base_.model.backbone = dict(
8
+ type='mmdet.ResNet',
9
+ depth=50,
10
+ num_stages=4,
11
+ out_indices=(0, 1, 2, 3),
12
+ frozen_stages=-1,
13
+ norm_cfg=dict(type='BN', requires_grad=True),
14
+ norm_eval=True,
15
+ style='pytorch',
16
+ init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'))
17
+
18
+ _base_.train_dataloader.num_workers = 24
19
+ _base_.optim_wrapper.optimizer.lr = 0.003
20
+
21
+ param_scheduler = [
22
+ dict(type='LinearLR', end=200, start_factor=0.001),
23
+ dict(type='PolyLR', power=0.9, eta_min=1e-7, begin=200, end=1200),
24
+ ]
configs/textdet/dbnetpp/metafile.yml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Collections:
2
+ - Name: DBNetpp
3
+ Metadata:
4
+ Training Data: ICDAR2015
5
+ Training Techniques:
6
+ - SGD with Momentum
7
+ - Weight Decay
8
+ Training Resources: 1x NVIDIA A100-SXM4-80GB
9
+ Architecture:
10
+ - ResNet
11
+ - FPNC
12
+ Paper:
13
+ URL: https://arxiv.org/abs/2202.10304
14
+ Title: 'Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion'
15
+ README: configs/textdet/dbnetpp/README.md
16
+
17
+ Models:
18
+ - Name: dbnetpp_resnet50_fpnc_1200e_icdar2015
19
+ In Collection: DBNetpp
20
+ Alias:
21
+ - DBPP_r50
22
+ Config: configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015.py
23
+ Metadata:
24
+ Training Data: ICDAR2015
25
+ Results:
26
+ - Task: Text Detection
27
+ Dataset: ICDAR2015
28
+ Metrics:
29
+ hmean-iou: 0.8622
30
+ Weights: https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015/dbnetpp_resnet50_fpnc_1200e_icdar2015_20221025_185550-013730aa.pth
31
+
32
+ - Name: dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015
33
+ In Collection: DBNetpp
34
+ Config: configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py
35
+ Metadata:
36
+ Training Data: ICDAR2015
37
+ Results:
38
+ - Task: Text Detection
39
+ Dataset: ICDAR2015
40
+ Metrics:
41
+ hmean-iou: 0.8684
42
+ Weights: https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015_20220829_230108-f289bd20.pth
43
+
44
+ - Name: dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015
45
+ Alias:
46
+ - DBNetpp
47
+ In Collection: DBNetpp
48
+ Config: configs/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015.py
49
+ Metadata:
50
+ Training Data: ICDAR2015
51
+ Results:
52
+ - Task: Text Detection
53
+ Dataset: ICDAR2015
54
+ Metrics:
55
+ hmean-iou: 0.8882
56
+ Weights: https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015/dbnetpp_resnet50-oclip_fpnc_1200e_icdar2015_20221101_124139-4ecb39ac.pth
configs/textdet/drrg/README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DRRG
2
+
3
+ > [Deep relational reasoning graph network for arbitrary shape text detection](https://arxiv.org/abs/2003.07493)
4
+
5
+ <!-- [ALGORITHM] -->
6
+
7
+ ## Abstract
8
+
9
+ Arbitrary shape text detection is a challenging task due to the high variety and complexity of scenes texts. In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection. In our method, an innovative local graph bridges a text proposal model via Convolutional Neural Network (CNN) and a deep relational reasoning network via Graph Convolutional Network (GCN), making our network end-to-end trainable. To be concrete, every text instance will be divided into a series of small rectangular components, and the geometry attributes (e.g., height, width, and orientation) of the small components will be estimated by our text proposal model. Given the geometry attributes, the local graph construction model can roughly establish linkages between different text components. For further reasoning and deducing the likelihood of linkages between the component and its neighbors, we adopt a graph-based network to perform deep relational reasoning on local graphs. Experiments on public available datasets demonstrate the state-of-the-art performance of our method.
10
+
11
+ <div align=center>
12
+ <img src="https://user-images.githubusercontent.com/22607038/142791777-f282300a-fb83-4b5a-a7d4-29f308949f11.png"/>
13
+ </div>
14
+
15
+ ## Results and models
16
+
17
+ ### CTW1500
18
+
19
+ | Method | BackBone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
20
+ | :-------------------------------------: | :---------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :----------------------------------------: |
21
+ | [DRRG](/configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py) | ResNet50 | - | CTW1500 Train | CTW1500 Test | 1200 | 640 | 0.8775 | 0.8179 | 0.8467 | [model](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/drrg_resnet50_fpn-unet_1200e_ctw1500_20220827_105233-d5c702dd.pth) \\ [log](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/20220827_105233.log) |
22
+ | [DRRG_r50-oclip](/configs/textdet/drrg/drrg_resnet50-oclip_fpn-unet_1200e_ctw1500.py) | [ResNet50-oCLIP](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) | - | CTW1500 Train | CTW1500 Test | 1200 | | | | | [model](<>) \\ [log](<>) |
23
+
24
+ ## Citation
25
+
26
+ ```bibtex
27
+ @article{zhang2020drrg,
28
+ title={Deep relational reasoning graph network for arbitrary shape text detection},
29
+ author={Zhang, Shi-Xue and Zhu, Xiaobin and Hou, Jie-Bo and Liu, Chang and Yang, Chun and Wang, Hongfa and Yin, Xu-Cheng},
30
+ booktitle={CVPR},
31
+ pages={9699-9708},
32
+ year={2020}
33
+ }
34
+ ```
configs/textdet/drrg/_base_drrg_resnet50_fpn-unet.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model = dict(
2
+ type='DRRG',
3
+ backbone=dict(
4
+ type='mmdet.ResNet',
5
+ depth=50,
6
+ num_stages=4,
7
+ out_indices=(0, 1, 2, 3),
8
+ frozen_stages=-1,
9
+ norm_cfg=dict(type='BN', requires_grad=True),
10
+ init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
11
+ norm_eval=True,
12
+ style='caffe'),
13
+ neck=dict(
14
+ type='FPN_UNet', in_channels=[256, 512, 1024, 2048], out_channels=32),
15
+ det_head=dict(
16
+ type='DRRGHead',
17
+ in_channels=32,
18
+ text_region_thr=0.3,
19
+ center_region_thr=0.4,
20
+ module_loss=dict(type='DRRGModuleLoss'),
21
+ postprocessor=dict(type='DRRGPostprocessor', link_thr=0.80)),
22
+ data_preprocessor=dict(
23
+ type='TextDetDataPreprocessor',
24
+ mean=[123.675, 116.28, 103.53],
25
+ std=[58.395, 57.12, 57.375],
26
+ bgr_to_rgb=True,
27
+ pad_size_divisor=32))
28
+
29
+ train_pipeline = [
30
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
31
+ dict(
32
+ type='LoadOCRAnnotations',
33
+ with_bbox=True,
34
+ with_polygon=True,
35
+ with_label=True),
36
+ dict(
37
+ type='TorchVisionWrapper',
38
+ op='ColorJitter',
39
+ brightness=32.0 / 255,
40
+ saturation=0.5),
41
+ dict(
42
+ type='RandomResize',
43
+ scale=(800, 800),
44
+ ratio_range=(0.75, 2.5),
45
+ keep_ratio=True),
46
+ dict(
47
+ type='TextDetRandomCropFlip',
48
+ crop_ratio=0.5,
49
+ iter_num=1,
50
+ min_area_ratio=0.2),
51
+ dict(
52
+ type='RandomApply',
53
+ transforms=[dict(type='RandomCrop', min_side_ratio=0.3)],
54
+ prob=0.8),
55
+ dict(
56
+ type='RandomApply',
57
+ transforms=[
58
+ dict(
59
+ type='RandomRotate',
60
+ max_angle=60,
61
+ use_canvas=True,
62
+ pad_with_fixed_color=False)
63
+ ],
64
+ prob=0.5),
65
+ dict(
66
+ type='RandomChoice',
67
+ transforms=[[
68
+ dict(type='Resize', scale=800, keep_ratio=True),
69
+ dict(type='SourceImagePad', target_scale=800)
70
+ ],
71
+ dict(type='Resize', scale=800, keep_ratio=False)],
72
+ prob=[0.4, 0.6]),
73
+ dict(type='RandomFlip', prob=0.5, direction='horizontal'),
74
+ dict(
75
+ type='PackTextDetInputs',
76
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
77
+ ]
78
+
79
+ test_pipeline = [
80
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
81
+ dict(type='Resize', scale=(1024, 640), keep_ratio=True),
82
+ # add loading annotation after ``Resize`` because ground truth
83
+ # does not need to do resize data transform
84
+ dict(
85
+ type='LoadOCRAnnotations',
86
+ with_polygon=True,
87
+ with_bbox=True,
88
+ with_label=True),
89
+ dict(
90
+ type='PackTextDetInputs',
91
+ meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
92
+ ]
configs/textdet/drrg/drrg_resnet50-oclip_fpn-unet_1200e_ctw1500.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ 'drrg_resnet50_fpn-unet_1200e_ctw1500.py',
3
+ ]
4
+
5
+ load_from = None
6
+
7
+ _base_.model.backbone = dict(
8
+ type='CLIPResNet',
9
+ init_cfg=dict(
10
+ type='Pretrained',
11
+ checkpoint='https://download.openmmlab.com/'
12
+ 'mmocr/backbone/resnet50-oclip-7ba0c533.pth'))
13
+
14
+ param_scheduler = [
15
+ dict(type='LinearLR', end=100, start_factor=0.001),
16
+ dict(type='PolyLR', power=0.9, eta_min=1e-7, begin=100, end=1200),
17
+ ]
configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _base_ = [
2
+ '_base_drrg_resnet50_fpn-unet.py',
3
+ '../_base_/datasets/ctw1500.py',
4
+ '../_base_/default_runtime.py',
5
+ '../_base_/schedules/schedule_sgd_1200e.py',
6
+ ]
7
+
8
+ # dataset settings
9
+ ctw1500_textdet_train = _base_.ctw1500_textdet_train
10
+ ctw1500_textdet_train.pipeline = _base_.train_pipeline
11
+ ctw1500_textdet_test = _base_.ctw1500_textdet_test
12
+ ctw1500_textdet_test.pipeline = _base_.test_pipeline
13
+
14
+ train_dataloader = dict(
15
+ batch_size=4,
16
+ num_workers=4,
17
+ persistent_workers=True,
18
+ sampler=dict(type='DefaultSampler', shuffle=True),
19
+ dataset=ctw1500_textdet_train)
20
+
21
+ val_dataloader = dict(
22
+ batch_size=1,
23
+ num_workers=1,
24
+ persistent_workers=True,
25
+ sampler=dict(type='DefaultSampler', shuffle=False),
26
+ dataset=ctw1500_textdet_test)
27
+
28
+ test_dataloader = val_dataloader
29
+
30
+ auto_scale_lr = dict(base_batch_size=16)
configs/textdet/drrg/metafile.yml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Collections:
2
+ - Name: DRRG
3
+ Metadata:
4
+ Training Data: SCUT-CTW1500
5
+ Training Techniques:
6
+ - SGD with Momentum
7
+ Training Resources: 4x NVIDIA A100-SXM4-80GB
8
+ Architecture:
9
+ - ResNet
10
+ - FPN_UNet
11
+ Paper:
12
+ URL: https://arxiv.org/abs/2003.07493.pdf
13
+ Title: 'Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection'
14
+ README: configs/textdet/drrg/README.md
15
+
16
+ Models:
17
+ - Name: drrg_resnet50_fpn-unet_1200e_ctw1500
18
+ Alias: DRRG
19
+ In Collection: DRRG
20
+ Config: configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py
21
+ Metadata:
22
+ Training Data: CTW1500
23
+ Results:
24
+ - Task: Text Detection
25
+ Dataset: CTW1500
26
+ Metrics:
27
+ hmean-iou: 0.8467
28
+ Weights: https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/drrg_resnet50_fpn-unet_1200e_ctw1500_20220827_105233-d5c702dd.pth
configs/textdet/fcenet/README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FCENet
2
+
3
+ > [Fourier Contour Embedding for Arbitrary-Shaped Text Detection](https://arxiv.org/abs/2104.10442)
4
+
5
+ <!-- [ALGORITHM] -->
6
+
7
+ ## Abstract
8
+
9
+ One of the main challenges for arbitrary-shaped text detection is to design a good text instance representation that allows networks to learn diverse text geometry variances. Most of existing methods model text instances in image spatial domain via masks or contour point sequences in the Cartesian or the polar coordinate system. However, the mask representation might lead to expensive post-processing, while the point sequence one may have limited capability to model texts with highly-curved shapes. To tackle these problems, we model text instances in the Fourier domain and propose one novel Fourier Contour Embedding (FCE) method to represent arbitrary shaped text contours as compact signatures. We further construct FCENet with a backbone, feature pyramid networks (FPN) and a simple post-processing with the Inverse Fourier Transformation (IFT) and Non-Maximum Suppression (NMS). Different from previous methods, FCENet first predicts compact Fourier signatures of text instances, and then reconstructs text contours via IFT and NMS during test. Extensive experiments demonstrate that FCE is accurate and robust to fit contours of scene texts even with highly-curved shapes, and also validate the effectiveness and the good generalization of FCENet for arbitrary-shaped text detection. Furthermore, experimental results show that our FCENet is superior to the state-of-the-art (SOTA) methods on CTW1500 and Total-Text, especially on challenging highly-curved text subset.
10
+
11
+ <div align=center>
12
+ <img src="https://user-images.githubusercontent.com/22607038/142791859-1b0ebde4-b151-4c25-ba1b-f354bd8ddc8c.png"/>
13
+ </div>
14
+
15
+ ## Results and models
16
+
17
+ ### CTW1500
18
+
19
+ | Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
20
+ | :------------------------------------: | :---------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :-------: | :----: | :----: | :---------------------------------------: |
21
+ | [FCENet_r50dcn](/configs/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | - | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.8689 | 0.8296 | 0.8488 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500_20220825_221510-4d705392.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/20220825_221510.log) |
22
+ | [FCENet_r50-oclip](/configs/textdet/fcenet/fcenet_resnet50-oclip-dcnv2_fpn_1500e_ctw1500.py) | [ResNet50-oCLIP](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) | - | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.8383 | 0.801 | 0.8192 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-oclip_fpn_1500e_ctw1500/fcenet_resnet50-oclip_fpn_1500e_ctw1500_20221102_121909-101df7e6.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-oclip_fpn_1500e_ctw1500/20221102_121909.log) |
23
+
24
+ ### ICDAR2015
25
+
26
+ | Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
27
+ | :---------------------------------------------------: | :------------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :-------: | :----: | :----: | :------------------------------------------------------: |
28
+ | [FCENet_r50](/configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py) | ResNet50 | - | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.8243 | 0.8834 | 0.8528 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/fcenet_resnet50_fpn_1500e_icdar2015_20220826_140941-167d9042.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/20220826_140941.log) |
29
+ | [FCENet_r50-oclip](/configs/textdet/fcenet/fcenet_resnet50-oclip_fpn_1500e_icdar2015.py) | ResNet50-oCLIP | - | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.9176 | 0.8098 | 0.8604 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-oclip_fpn_1500e_icdar2015/fcenet_resnet50-oclip_fpn_1500e_icdar2015_20221101_150145-5a6fc412.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-oclip_fpn_1500e_icdar2015/20221101_150145.log) |
30
+
31
+ ### Total Text
32
+
33
+ | Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
34
+ | :---------------------------------------------------: | :------: | :--------------: | :-------------: | :------------: | :-----: | :---------: | :-------: | :----: | :----: | :-----------------------------------------------------: |
35
+ | [FCENet_r50](/configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_totaltext.py) | ResNet50 | - | Totaltext Train | Totaltext Test | 1500 | (1280, 960) | 0.8485 | 0.7810 | 0.8134 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_totaltext/fcenet_resnet50_fpn_1500e_totaltext-91bd37af.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_totaltext/20221219_201107.log) |
36
+
37
+ ## Citation
38
+
39
+ ```bibtex
40
+ @InProceedings{zhu2021fourier,
41
+ title={Fourier Contour Embedding for Arbitrary-Shaped Text Detection},
42
+ author={Yiqin Zhu and Jianyong Chen and Lingyu Liang and Zhanghui Kuang and Lianwen Jin and Wayne Zhang},
43
+ year={2021},
44
+ booktitle = {CVPR}
45
+ }
46
+ ```