captchaboy commited on
Commit
9c9357e
β€’
1 Parent(s): 166850f

Create new file

Browse files
Files changed (1) hide show
  1. README.md +11 -136
README.md CHANGED
@@ -1,136 +1,11 @@
1
- # IterVM: Iterative Vision Modeling Module for Scene Text Recognition
2
-
3
- The official code of [IterNet](https://arxiv.org/abs/2204.02630).
4
-
5
- We propose IterVM, an iterative approach for visual feature extraction which can significantly improve scene text recognition accuracy.
6
- IterVM repeatedly uses the high-level visual feature extracted at the previous iteration to enhance the multi-level features extracted at the subsequent iteration.
7
-
8
-
9
- ![framework](./figures/framework.png)
10
-
11
-
12
- ## Runtime Environment
13
- ```
14
- pip install -r requirements.txt
15
- ```
16
- Note: `fastai==1.0.60` is required.
17
-
18
- ## Datasets
19
- <details>
20
- <summary>Training datasets (Click to expand) </summary>
21
- 1. [MJSynth](http://www.robots.ox.ac.uk/~vgg/data/text/) (MJ):
22
- - Use `tools/create_lmdb_dataset.py` to convert images into LMDB dataset
23
- - [LMDB dataset BaiduNetdisk(passwd:n23k)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
24
- 2. [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) (ST):
25
- - Use `tools/crop_by_word_bb.py` to crop images from original [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) dataset, and convert images into LMDB dataset by `tools/create_lmdb_dataset.py`
26
- - [LMDB dataset BaiduNetdisk(passwd:n23k)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
27
- 3. [WikiText103](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip), which is only used for pre-trainig language models:
28
- - Use `notebooks/prepare_wikitext103.ipynb` to convert text into CSV format.
29
- - [CSV dataset BaiduNetdisk(passwd:dk01)](https://pan.baidu.com/s/1yabtnPYDKqhBb_Ie9PGFXA)
30
- </details>
31
-
32
- <details>
33
- <summary>Evaluation datasets (Click to expand) </summary>
34
- - Evaluation datasets, LMDB datasets can be downloaded from [BaiduNetdisk(passwd:1dbv)](https://pan.baidu.com/s/1RUg3Akwp7n8kZYJ55rU5LQ), [GoogleDrive](https://drive.google.com/file/d/1dTI0ipu14Q1uuK4s4z32DqbqF3dJPdkk/view?usp=sharing).
35
- 1. ICDAR 2013 (IC13)
36
- 2. ICDAR 2015 (IC15)
37
- 3. IIIT5K Words (IIIT)
38
- 4. Street View Text (SVT)
39
- 5. Street View Text-Perspective (SVTP)
40
- 6. CUTE80 (CUTE)
41
- </details>
42
-
43
- <details>
44
- <summary>The structure of `data` directory (Click to expand) </summary>
45
- - The structure of `data` directory is
46
- ```
47
- data
48
- β”œβ”€β”€ charset_36.txt
49
- β”œβ”€β”€ evaluation
50
- β”‚Β Β  β”œβ”€β”€ CUTE80
51
- β”‚Β Β  β”œβ”€β”€ IC13_857
52
- β”‚Β Β  β”œβ”€β”€ IC15_1811
53
- β”‚Β Β  β”œβ”€β”€ IIIT5k_3000
54
- β”‚Β Β  β”œβ”€β”€ SVT
55
- β”‚Β Β  └── SVTP
56
- β”œβ”€β”€ training
57
- β”‚Β Β  β”œβ”€β”€ MJ
58
- β”‚Β Β  β”‚Β Β  β”œβ”€β”€ MJ_test
59
- β”‚Β Β  β”‚Β Β  β”œβ”€β”€ MJ_train
60
- β”‚Β Β  β”‚Β Β  └── MJ_valid
61
- β”‚Β Β  └── ST
62
- β”œβ”€β”€ WikiText-103.csv
63
- └── WikiText-103_eval_d1.csv
64
- ```
65
- </details>
66
-
67
- ## Pretrained Models
68
-
69
- Get the pretrained models from [GoogleDrive](https://drive.google.com/drive/folders/1C8NMI8Od8mQUMlsnkHNLkYj73kbAQ7Bl?usp=sharing). Performances of the pretrained models are summaried as follows:
70
-
71
- |Model|IC13|SVT|IIIT|IC15|SVTP|CUTE|AVG|
72
- |-|-|-|-|-|-|-|-|
73
- |IterNet|97.9|95.1|96.9|87.7|90.9|91.3|93.8|
74
-
75
- ## Training
76
-
77
- 1. Pre-train vision model
78
- ```
79
- CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --config=configs/pretrain_vm.yaml
80
- ```
81
- 2. Pre-train language model
82
- ```
83
- CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_language_model.yaml
84
- ```
85
- 3. Train IterNet
86
- ```
87
- CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --config=configs/train_iternet.yaml
88
- ```
89
- Note:
90
- - You can set the `checkpoint` path for vision model (vm) and language model separately for specific pretrained model, or set to `None` to train from scratch
91
-
92
-
93
- ## Evaluation
94
-
95
- ```
96
- CUDA_VISIBLE_DEVICES=0 python main.py --config=configs/train_iternet.yaml --phase test --image_only
97
- ```
98
- Additional flags:
99
- - `--checkpoint /path/to/checkpoint` set the path of evaluation model
100
- - `--test_root /path/to/dataset` set the path of evaluation dataset
101
- - `--model_eval [alignment|vision]` which sub-model to evaluate
102
- - `--image_only` disable dumping visualization of attention masks
103
-
104
- ## Run Demo
105
- [<a href="https://colab.research.google.com/drive/1XmZGJzFF95uafmARtJMudPLLKBO2eXLv?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google colab logo"></a>](https://colab.research.google.com/drive/1XmZGJzFF95uafmARtJMudPLLKBO2eXLv?usp=sharing)
106
-
107
- ```
108
- python demo.py --config=configs/train_iternet.yaml --input=figures/demo
109
- ```
110
- Additional flags:
111
- - `--config /path/to/config` set the path of configuration file
112
- - `--input /path/to/image-directory` set the path of image directory or wildcard path, e.g, `--input='figs/test/*.png'`
113
- - `--checkpoint /path/to/checkpoint` set the path of trained model
114
- - `--cuda [-1|0|1|2|3...]` set the cuda id, by default -1 is set and stands for cpu
115
- - `--model_eval [alignment|vision]` which sub-model to use
116
- - `--image_only` disable dumping visualization of attention masks
117
-
118
-
119
- ## Citation
120
- If you find our method useful for your reserach, please cite
121
- ```bash
122
- @article{chu2022itervm,
123
- title={IterVM: Iterative Vision Modeling Module for Scene Text Recognition},
124
- author={Chu, Xiaojie and Wang, Yongtao},
125
- journal={arXiv preprint arXiv:2204.02630},
126
- year={2022}
127
- }
128
- ```
129
-
130
- ## License
131
- The project is only free for academic research purposes, but needs authorization for commerce. For commerce permission, please contact wyt@pku.edu.cn.
132
-
133
- ## Acknowledgements
134
- This project is based on [ABINet](https://github.com/FangShancheng/ABINet.git).
135
- Thanks for their great works.
136
-
 
1
+ ---
2
+ title: Pixelplanet OCR
3
+ emoji: πŸƒ
4
+ colorFrom: indigo
5
+ colorTo: red
6
+ sdk: gradio
7
+ sdk_version: 2.8.12
8
+ app_file: app.py
9
+ pinned: false
10
+ license: bsd
11
+ ---