tomofi commited on
Commit
f43850b
β€’
1 Parent(s): cb433d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -138
README.md CHANGED
@@ -1,138 +1,13 @@
1
- # Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
2
-
3
- The official code of [ABINet](https://arxiv.org/pdf/2103.06495.pdf) (CVPR 2021, Oral).
4
-
5
- ABINet uses a vision model and an explicit language model to recognize text in the wild, which are trained in end-to-end way. The language model (BCN) achieves bidirectional language representation in simulating cloze test, additionally utilizing iterative correction strategy.
6
-
7
- ![framework](./figs/framework.png)
8
-
9
- ## Runtime Environment
10
-
11
- - We provide a pre-built docker image using the Dockerfile from `docker/Dockerfile`
12
-
13
- - Running in Docker
14
- ```
15
- $ git@github.com:FangShancheng/ABINet.git
16
- $ docker run --gpus all --rm -ti --ipc=host -v $(pwd)/ABINet:/app fangshancheng/fastai:torch1.1 /bin/bash
17
- ```
18
- - (Untested) Or using the dependencies
19
- ```
20
- pip install -r requirements.txt
21
- ```
22
-
23
- ## Datasets
24
-
25
- - Training datasets
26
-
27
- 1. [MJSynth](http://www.robots.ox.ac.uk/~vgg/data/text/) (MJ):
28
- - Use `tools/create_lmdb_dataset.py` to convert images into LMDB dataset
29
- - [LMDB dataset BaiduNetdisk(passwd:n23k)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
30
- 2. [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) (ST):
31
- - Use `tools/crop_by_word_bb.py` to crop images from original [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) dataset, and convert images into LMDB dataset by `tools/create_lmdb_dataset.py`
32
- - [LMDB dataset BaiduNetdisk(passwd:n23k)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
33
- 3. [WikiText103](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip), which is only used for pre-trainig language models:
34
- - Use `notebooks/prepare_wikitext103.ipynb` to convert text into CSV format.
35
- - [CSV dataset BaiduNetdisk(passwd:dk01)](https://pan.baidu.com/s/1yabtnPYDKqhBb_Ie9PGFXA)
36
-
37
- - Evaluation datasets, LMDB datasets can be downloaded from [BaiduNetdisk(passwd:1dbv)](https://pan.baidu.com/s/1RUg3Akwp7n8kZYJ55rU5LQ), [GoogleDrive](https://drive.google.com/file/d/1dTI0ipu14Q1uuK4s4z32DqbqF3dJPdkk/view?usp=sharing).
38
- 1. ICDAR 2013 (IC13)
39
- 2. ICDAR 2015 (IC15)
40
- 3. IIIT5K Words (IIIT)
41
- 4. Street View Text (SVT)
42
- 5. Street View Text-Perspective (SVTP)
43
- 6. CUTE80 (CUTE)
44
-
45
-
46
- - The structure of `data` directory is
47
- ```
48
- data
49
- β”œβ”€β”€ charset_36.txt
50
- β”œβ”€β”€ evaluation
51
- β”‚Β Β  β”œβ”€β”€ CUTE80
52
- β”‚Β Β  β”œβ”€β”€ IC13_857
53
- β”‚Β Β  β”œβ”€β”€ IC15_1811
54
- β”‚Β Β  β”œβ”€β”€ IIIT5k_3000
55
- β”‚Β Β  β”œβ”€β”€ SVT
56
- β”‚Β Β  └── SVTP
57
- β”œβ”€β”€ training
58
- β”‚Β Β  β”œβ”€β”€ MJ
59
- β”‚Β Β  β”‚Β Β  β”œβ”€β”€ MJ_test
60
- β”‚Β Β  β”‚Β Β  β”œβ”€β”€ MJ_train
61
- β”‚Β Β  β”‚Β Β  └── MJ_valid
62
- β”‚Β Β  └── ST
63
- β”œβ”€β”€ WikiText-103.csv
64
- └── WikiText-103_eval_d1.csv
65
- ```
66
-
67
- ### Pretrained Models
68
-
69
- Get the pretrained models from [BaiduNetdisk(passwd:kwck)](https://pan.baidu.com/s/1b3vyvPwvh_75FkPlp87czQ), [GoogleDrive](https://drive.google.com/file/d/1mYM_26qHUom_5NU7iutHneB_KHlLjL5y/view?usp=sharing). Performances of the pretrained models are summaried as follows:
70
-
71
- |Model|IC13|SVT|IIIT|IC15|SVTP|CUTE|AVG|
72
- |-|-|-|-|-|-|-|-|
73
- |ABINet-SV|97.1|92.7|95.2|84.0|86.7|88.5|91.4|
74
- |ABINet-LV|97.0|93.4|96.4|85.9|89.5|89.2|92.7|
75
-
76
- ## Training
77
-
78
- 1. Pre-train vision model
79
- ```
80
- CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_vision_model.yaml
81
- ```
82
- 2. Pre-train language model
83
- ```
84
- CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_language_model.yaml
85
- ```
86
- 3. Train ABINet
87
- ```
88
- CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/train_abinet.yaml
89
- ```
90
- Note:
91
- - You can set the `checkpoint` path for vision and language models separately for specific pretrained model, or set to `None` to train from scratch
92
-
93
-
94
- ## Evaluation
95
-
96
- ```
97
- CUDA_VISIBLE_DEVICES=0 python main.py --config=configs/train_abinet.yaml --phase test --image_only
98
- ```
99
- Additional flags:
100
- - `--checkpoint /path/to/checkpoint` set the path of evaluation model
101
- - `--test_root /path/to/dataset` set the path of evaluation dataset
102
- - `--model_eval [alignment|vision]` which sub-model to evaluate
103
- - `--image_only` disable dumping visualization of attention masks
104
-
105
- ## Run Demo
106
-
107
- ```
108
- python demo.py --config=configs/train_abinet.yaml --input=figs/test
109
- ```
110
- Additional flags:
111
- - `--config /path/to/config` set the path of configuration file
112
- - `--input /path/to/image-directory` set the path of image directory or wildcard path, e.g, `--input='figs/test/*.png'`
113
- - `--checkpoint /path/to/checkpoint` set the path of trained model
114
- - `--cuda [-1|0|1|2|3...]` set the cuda id, by default -1 is set and stands for cpu
115
- - `--model_eval [alignment|vision]` which sub-model to use
116
- - `--image_only` disable dumping visualization of attention masks
117
-
118
- ## Visualization
119
- Successful and failure cases on low-quality images:
120
-
121
- ![cases](./figs/cases.png)
122
-
123
- ## Citation
124
- If you find our method useful for your reserach, please cite
125
- ```bash
126
- @article{fang2021read,
127
- title={Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition},
128
- author={Fang, Shancheng and Xie, Hongtao and Wang, Yuxin and Mao, Zhendong and Zhang, Yongdong},
129
- booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
130
- year={2021}
131
- }
132
- ```
133
-
134
- ## License
135
-
136
- This project is only free for academic research purposes, licensed under the 2-clause BSD License - see the LICENSE file for details.
137
-
138
- Feel free to contact fangsc@ustc.edu.cn if you have any questions.
1
+ ---
2
+ title: ABINet OCR
3
+ emoji: πŸƒ
4
+ colorFrom: indigo
5
+ colorTo: red
6
+ sdk: gradio
7
+ sdk_version: 2.8.12
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference