starride-teklia commited on
Commit
dfe94d0
1 Parent(s): 5628463

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: PyLaia
3
+ license: mit
4
+ tags:
5
+ - PyLaia
6
+ - PyTorch
7
+ - Handwritten text recognition
8
+ metrics:
9
+ - CER
10
+ - WER
11
+ language:
12
+ - 'no'
13
+ ---
14
+
15
+ # Hugin-Munin handwritten text recognition
16
+
17
+ This model performs Handwritten Text Recognition in Norwegian. It was was developed during the [HUGIN-MUNIN project](https://hugin-munin-project.github.io/).
18
+
19
+ ## Model description
20
+
21
+ The model has been trained using the PyLaia library on the [NorHand](https://zenodo.org/record/6542056) document images.
22
+ Line bounding boxes were improved using a post-processing step.
23
+
24
+ Training images were resized with a fixed height of 128 pixels, keeping the original aspect ratio.
25
+
26
+ ## Evaluation results
27
+
28
+ The model achieves the following results:
29
+
30
+ | set | CER (%) | WER (%) |
31
+ | ----- | ---------- | --------- |
32
+ | train | 2.33 | 5.62 |
33
+ | val | 8.20 | 24.75 |
34
+ | test | 7.81 | 23.3 |
35
+
36
+
37
+ Results improve on validation and test sets when PyLaia is combined with a 6-gram language model.
38
+ The language model is trained on [this text corpus](https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-73/) published by the National Library of Norway.
39
+
40
+ | set | CER (%) | WER (%) |
41
+ | ----- | ---------- | --------- |
42
+ | train | 2.62 | 6.13 |
43
+ | val | 7.01 | 19.75 |
44
+ | test | 6.75 | 18.22 |
45
+
46
+
47
+ ## How to use
48
+
49
+ Please refer to the PyLaia library page (https://pypi.org/project/pylaia/) to use this model.
50
+
51
+ # Cite us!
52
+
53
+ ```bibtex
54
+ @inproceedings{10.1007/978-3-031-06555-2_27,
55
+ author = {Maarand, Martin and Beyer, Yngvil and K\r{a}sen, Andre and Fosseide, Knut T. and Kermorvant, Christopher},
56
+ title = {A Comprehensive Comparison of Open-Source Libraries for Handwritten Text Recognition in Norwegian},
57
+ year = {2022},
58
+ isbn = {978-3-031-06554-5},
59
+ publisher = {Springer-Verlag},
60
+ address = {Berlin, Heidelberg},
61
+ url = {https://doi.org/10.1007/978-3-031-06555-2_27},
62
+ doi = {10.1007/978-3-031-06555-2_27},
63
+ booktitle = {Document Analysis Systems: 15th IAPR International Workshop, DAS 2022, La Rochelle, France, May 22–25, 2022, Proceedings},
64
+ pages = {399–413},
65
+ numpages = {15},
66
+ keywords = {Norwegian language, Open-source, Handwriting recognition},
67
+ location = {La Rochelle, France}
68
+ }
69
+ ```