pere commited on
Commit
4ef3914
1 Parent(s): a2b53ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -8
README.md CHANGED
@@ -3,20 +3,16 @@ language: no
3
  tags:
4
  - grammar
5
  widget:
6
- - text: "this is a test of a program developed in norway it is actually strange that it is able to do stuff like this even on sequences that are tricky to read by humans
7
- "
8
- ---
9
-
10
-
11
- ---
12
  license: cc-by-4.0
13
  ---
 
14
  # DeUnCaser
15
  The output from Automated Speak Recognition software is usually uncased and without any punctation. This does not make a very readable text.
16
 
17
- The DeUnCaser is a sequence-to-sequence byT5 model that is reversing this process. It adds punctation, and capitalises the correct words (in some languages the start of sentences and proper nouns, in other languages, like German, all nouns).
18
 
19
- It is using a multi-lingual base, however the first test version is only trained on Norwegian. I will update it with support for other languages by demand.
20
 
21
  ## Example input - output
22
  ````
 
3
  tags:
4
  - grammar
5
  widget:
6
+ - text: "this is a test of a program developed in norway it is actually strange that it is able to do stuff like this even on sequences that are tricky to read by humans"
 
 
 
 
 
7
  license: cc-by-4.0
8
  ---
9
+
10
  # DeUnCaser
11
  The output from Automated Speak Recognition software is usually uncased and without any punctation. This does not make a very readable text.
12
 
13
+ The DeUnCaser is a sequence-to-sequence byT5 model that is reversing this process. It adds punctation, and capitalises the correct words. In some languages this means adding capital letters at start of sentences and on all proper nouns, in other languages, like German, it means capitalising the first letter of all nouns. It will also make attempts at adding hyphens and parentheses if this is making the meaning clearer.
14
 
15
+ It is using based on the multi-lingual base model. However the current finetuning is only done on Norwegian. For other languages this will be mainly experimental. I will update it with support for other languages if there is any demand.
16
 
17
  ## Example input - output
18
  ````