mboillet NBoukachab commited on
Commit
fc5248f
1 Parent(s): 7978f54

Add pylaia-home-alcar files (#1)

Browse files

- Add model files (95b57e209d58add61ed13d36f6a26aa5262e315b)
- Add description to README (c081ac5b7983677bd76bacf10858ee94231cdd4b)
- Modif the README (840a046edf78deb01acb00e99ce4a664a90681a6)
- Change README (2ce3ef51f0bc2d02decb54828641cf0408ee5099)
- Modif README and add .gitattributes (d1b9d6e801722eee034a549a7d7c94f84e2f6214)
- Modif the README (13bf7e6de0bc6fcebd555e0e0056ae76e3c48c1e)
- Modif the README (3be8916e69f11309378faec27550e02f0875d6e9)


Co-authored-by: Nolan Boukachab <NBoukachab@users.noreply.huggingface.co>

Files changed (5) hide show
  1. .gitattributes +1 -0
  2. README.md +65 -0
  3. model +3 -0
  4. syms.txt +130 -0
  5. weights.ckpt +3 -0
.gitattributes CHANGED
@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ model filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,68 @@
1
  ---
 
2
  license: mit
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: PyLaia
3
  license: mit
4
+ tags:
5
+ - PyLaia
6
+ - PyTorch
7
+ - Handwritten text recognition
8
+ metrics:
9
+ - CER
10
+ - WER
11
+ language:
12
+ - 'lat'
13
  ---
14
+
15
+ # HOME-Alcar and Himanis handwritten text recognition
16
+
17
+ This model performs Handwritten Text Recognition in Latin.
18
+
19
+ ## Model description
20
+
21
+ The model has been trained using the PyLaia library on the [HOME-Alcar](https://zenodo.org/record/5600884) document images.
22
+ The model was trained on images resized to a fixed height of 128 pixels, keeping the original aspect ratio.
23
+
24
+ ## Evaluation results
25
+
26
+ The model achieves the following results:
27
+
28
+ Himanis:
29
+
30
+ | set | CER (%) | WER (%) | support |
31
+ | ----- | ---------- | --------- | --------- |
32
+ | train | 5.31 | 17.47 | 18503 |
33
+ | val | 10.37 | 27.63 | 2367 |
34
+ | test | 9.87 | 28.27 | 2241 |
35
+
36
+
37
+ HOME-Alcar:
38
+
39
+ | set | CER (%) | WER (%) | support |
40
+ | ----- | ---------- | --------- | --------- |
41
+ | train | 4.74 | 17.29 | 59969 |
42
+ | val | 7.82 | 23.67 | 7905 |
43
+ | test | 8.34 | 24.57 | 6932 |
44
+
45
+ ## How to use
46
+
47
+ Please refer to the PyLaia library page (https://pypi.org/project/pylaia/) to use this model.
48
+
49
+ # Cite us!
50
+
51
+ ```bibtex
52
+ @inproceedings{10.1007/978-3-031-06555-2_29,
53
+ author = {Monroc, Claire Bizon and Miret, Blanche and Bonhomme, Marie-Laurence and Kermorvant, Christopher},
54
+ title = {A Comprehensive Study Of Open-Source Libraries For Named Entity Recognition On Handwritten Historical Documents},
55
+ year = {2022},
56
+ isbn = {978-3-031-06554-5},
57
+ publisher = {Springer-Verlag},
58
+ address = {Berlin, Heidelberg},
59
+ url = {https://doi.org/10.1007/978-3-031-06555-2_29},
60
+ doi = {10.1007/978-3-031-06555-2_29},
61
+ abstract = {In this paper, we propose an evaluation of several state-of-the-art open-source natural language processing (NLP) libraries for named entity recognition (NER) on handwritten historical documents: spaCy, Stanza and Flair. The comparison is carried out on three low-resource multilingual datasets of handwritten historical documents: HOME (a multilingual corpus of medieval charters), Balsac (a corpus of parish records from Quebec), and Esposalles (a corpus of marriage records in Catalan). We study the impact of the document recognition processes (text line detection and handwriting recognition) on the performance of the NER. We show that current off-the-shelf NER libraries yield state-of-the-art results, even on low-resource languages or multilingual documents using multilingual models. We show, in an end-to-end evaluation, that text line detection errors have a greater impact than handwriting recognition errors. Finally, we also report state-of-the-art results on the public Esposalles dataset.},
62
+ booktitle = {Document Analysis Systems: 15th IAPR International Workshop, DAS 2022, La Rochelle, France, May 22–25, 2022, Proceedings},
63
+ pages = {429–444},
64
+ numpages = {16},
65
+ keywords = {Text line detection, Named entity recognition, Handwritten historical documents},
66
+ location = {La Rochelle, France}
67
+ }
68
+ ```
model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e440353b46d736f2d6ebcd5fc577bbf2e6eec0931f2c215483eb73e6c94451b0
3
+ size 1519
syms.txt ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <ctc> 0
2
+ ! 1
3
+ & 2
4
+ # 3
5
+ ' 4
6
+ ( 5
7
+ ) 6
8
+ * 7
9
+ + 8
10
+ , 9
11
+ - 10
12
+ . 11
13
+ / 12
14
+ 0 13
15
+ 1 14
16
+ 2 15
17
+ 3 16
18
+ 4 17
19
+ 5 18
20
+ 6 19
21
+ 7 20
22
+ 8 21
23
+ 9 22
24
+ : 23
25
+ ; 24
26
+ = 25
27
+ ? 26
28
+ A 27
29
+ B 28
30
+ C 29
31
+ D 30
32
+ E 31
33
+ F 32
34
+ G 33
35
+ H 34
36
+ I 35
37
+ J 36
38
+ K 37
39
+ L 38
40
+ M 39
41
+ N 40
42
+ O 41
43
+ P 42
44
+ Q 43
45
+ R 44
46
+ S 45
47
+ T 46
48
+ U 47
49
+ V 48
50
+ W 49
51
+ X 50
52
+ Y 51
53
+ Z 52
54
+ [ 53
55
+ ] 54
56
+ a 55
57
+ b 56
58
+ c 57
59
+ d 58
60
+ e 59
61
+ f 60
62
+ g 61
63
+ h 62
64
+ i 63
65
+ j 64
66
+ k 65
67
+ l 66
68
+ m 67
69
+ n 68
70
+ o 69
71
+ p 70
72
+ q 71
73
+ r 72
74
+ s 73
75
+ t 74
76
+ u 75
77
+ v 76
78
+ w 77
79
+ x 78
80
+ y 79
81
+ z 80
82
+ | 81
83
+ ~ 82
84
+ ’ 83
85
+ © 84
86
+ § 85
87
+ ª 86
88
+ « 87
89
+ ¬ 88
90
+ ¯ 89
91
+ ° 90
92
+ ¶ 91
93
+ º 92
94
+ » 93
95
+ ¿ 94
96
+ À 95
97
+ Â 96
98
+ Ã 97
99
+ Ç 98
100
+ É 99
101
+ Ï 100
102
+ Ü 101
103
+ à 102
104
+ á 103
105
+ â 104
106
+ æ 105
107
+ ç 106
108
+ è 107
109
+ é 108
110
+ ë 109
111
+ ì 110
112
+ í 111
113
+ î 112
114
+ ï 113
115
+ ñ 114
116
+ ú 115
117
+ ù 116
118
+ û 117
119
+ ÿ 118
120
+ ę 119
121
+ ō 120
122
+ œ 121
123
+ ȩ 122
124
+ — 123
125
+ ‘ 124
126
+ ’ 125
127
+ … 126
128
+ † 127
129
+ <unk> 128
130
+ <space> 129
weights.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d11c5c5b6b01a8a45ab48f5433e6f86a5266b8c55fc82c68ac05cd3fe2f9c2a7
3
+ size 42863420