yeshpanovrustem commited on
Commit
faa072f
1 Parent(s): b72b769

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -6,10 +6,13 @@ license: cc-by-4.0
6
  - The original repository for the paper can be found at *https://github.com/IS2AI/KazNERD*.
7
  ## Differences
8
  While the original dataset contained tokens denoting speech disfluencies and hesitations (parenthesised) and background noise [bracketed], this model was trained on a version of the dataset where such tokens were removed.
 
9
 
10
  Dataset | Unit | Train | Valid | Test | Total |
11
  | :---: | :---: | :---: | :---: | :---: | :---: |
12
- KazNERD (Original)| Sentences | 90,228 (80.06%) | 11,167 (9.91%)| 11,307 (10.03%) | 112,702 (100%) |
13
- KazNERD (Cleaned) | Sentences | 88,540 (80.00%) | 11,067 (10.00%) | 11,068 (10.00%) | 110,675 (100%) |
14
- KazNERD (Original)| Tokens | 1,043,305 (80.11%) | 129,223 (9.92%)| 129,824 (9.97%) | 1,302,352 (100%) |
15
- KazNERD (Cleaned) | Tokens | 1,088,461 (80.04%) | 136,021 (10.00%) | 135,426 (9.96%) | 1,359,908 (100%) |
 
 
 
6
  - The original repository for the paper can be found at *https://github.com/IS2AI/KazNERD*.
7
  ## Differences
8
  While the original dataset contained tokens denoting speech disfluencies and hesitations (parenthesised) and background noise [bracketed], this model was trained on a version of the dataset where such tokens were removed.
9
+ Removing the tokens caused some changes in the number of sentences, tokens, and named entities (NEs).
10
 
11
  Dataset | Unit | Train | Valid | Test | Total |
12
  | :---: | :---: | :---: | :---: | :---: | :---: |
13
+ KazNERD (Original)| Sentence | 90,228 (80.06%) | 11,167 (9.91%)| 11,307 (10.03%) | 112,702 (100%) |
14
+ KazNERD (Cleaned) | Sentence | 88,540 (80.00%) | 11,067 (10.00%) | 11,068 (10.00%) | 110,675 (100%) |
15
+ KazNERD (Original)| Token | 1,043,305 (80.11%) | 129,223 (9.92%)| 129,824 (9.97%) | 1,302,352 (100%) |
16
+ KazNERD (Cleaned) | Token | 1,088,461 (80.04%) | 136,021 (10.00%) | 135,426 (9.96%) | 1,359,908 (100%) |
17
+ KazNERD (Original)| NE | 109,342 (80.20%) | 13,483 (9.89%)| 13,508 (9.91%) | 136,333 (100%) |
18
+ KazNERD (Cleaned) | NE | 106,148 (80.17%) | 13,189 (9.96%) | 13,072 (9.87%) | 132,409 (100%) |