Update README.md
Browse files
README.md
CHANGED
@@ -7,27 +7,42 @@ model-index:
|
|
7 |
results: []
|
8 |
---
|
9 |
|
10 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
11 |
-
should probably proofread and complete it, then remove this comment. -->
|
12 |
-
|
13 |
# bart-base-spelling-nl-2m
|
14 |
|
15 |
-
This model is a fine-tuned version of
|
|
|
|
|
16 |
It achieves the following results on the evaluation set:
|
17 |
- Loss: 0.0248
|
18 |
- Cer: 0.0133
|
19 |
|
20 |
## Model description
|
21 |
|
22 |
-
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
## Intended uses & limitations
|
25 |
|
26 |
-
|
|
|
|
|
27 |
|
28 |
## Training and evaluation data
|
29 |
|
30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
## Training procedure
|
33 |
|
|
|
7 |
results: []
|
8 |
---
|
9 |
|
|
|
|
|
|
|
10 |
# bart-base-spelling-nl-2m
|
11 |
|
12 |
+
This model is a Dutch fine-tuned version of
|
13 |
+
[facebook/bart-base](https://huggingface.co/facebook/bart-base).
|
14 |
+
|
15 |
It achieves the following results on the evaluation set:
|
16 |
- Loss: 0.0248
|
17 |
- Cer: 0.0133
|
18 |
|
19 |
## Model description
|
20 |
|
21 |
+
This is a fine-tuned version of
|
22 |
+
[facebook/bart-base](https://huggingface.co/facebook/bart-base)
|
23 |
+
trained on spelling correction. It leans on the excellent work by
|
24 |
+
Oliver Guhr ([github](https://github.com/oliverguhr/spelling),
|
25 |
+
[huggingface](https://huggingface.co/oliverguhr/spelling-correction-english-base)). Training
|
26 |
+
was performed on an AWS EC2 instance (g5.xlarge) on a single GPU.
|
27 |
|
28 |
## Intended uses & limitations
|
29 |
|
30 |
+
The intended use for this model is to be a component of the
|
31 |
+
[Valkuil.net](https://valkuil.net) context-sensitive spelling
|
32 |
+
checker.
|
33 |
|
34 |
## Training and evaluation data
|
35 |
|
36 |
+
The model was trained on a Dutch dataset composed of 4,964,203 lines
|
37 |
+
of text from three public Dutch sources, downloaded from the
|
38 |
+
[Opus corpus](https://opus.nlpl.eu/):
|
39 |
+
|
40 |
+
- nl-europarlv7.1m.txt (2,000,000 lines)
|
41 |
+
- nl-opensubtitles2016.1m.txt (2,000,000 lines)
|
42 |
+
- nl-wikipedia.txt (964,203 lines)
|
43 |
+
|
44 |
+
Together these texts comprise 73,818,804 tokens.
|
45 |
+
|
46 |
|
47 |
## Training procedure
|
48 |
|