yeshpanovrustem commited on
Commit
12b7fef
1 Parent(s): def38cf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -6
README.md CHANGED
@@ -24,15 +24,13 @@ widget:
24
  ## KazNERD (cleaned)
25
  While the original dataset contained tokens denoting speech disfluencies and hesitations (parenthesised) and background noise [bracketed], this model was trained on a version of the dataset where such tokens and duplicates were removed.
26
  As a result, the number of sentences, tokens, and named entities (NEs) in the cleaned dataset changed.
27
-
28
- **Statistics for training (Train), validation (Valid), and test (Test) sets**
29
  | Unit | Train | Valid | Test | Total |
30
  | :---: | :---: | :---: | :---: | :---: |
31
  | Sentence | 88,540 (80.00%) | 11,067 (10.00%) | 11,068 (10.00%) | 110,675 (100%) |
32
  | Token | 1,088,461 (80.04%) | 136,021 (10.00%) | 135,426 (9.96%) | 1,359,908 (100%) |
33
  | NE | 106,148 (80.17%) | 13,189 (9.96%) | 13,072 (9.87%) | 132,409 (100%) |
34
-
35
- **80 / 10 / 10 split**
36
  |Representation| Train | Valid | Test | Total |
37
  | :---: | :---: | :---: | :---: | :---: |
38
  | **AID** | 67,582 (79.99%) | 8,439 (9.99%) | 8,467 (10.02%)| 84,488 (100%) |
@@ -42,8 +40,7 @@ As a result, the number of sentences, tokens, and named entities (NEs) in the cl
42
  | **EID** | 260 (81.00%) | 27 (8.41%) | 34 (10.59%)| 321 (100%) |
43
  | **FID** | 9 (75.00%) | 1 (8.33%)| 2 (16.67%)| 12 (100%) |
44
  |**Total**| **88,540 (80.00%)** | **11,067 (10.00%)** | **11,068 (10.00%)** | **110,675 (100%)** |
45
-
46
- **Distribution of representations across sets**
47
  |Representation| Train | Valid | Test | Total |
48
  | :---: | :---: | :---: | :---: | :---: |
49
  | **AID** | 67,582 (76.33%) | 8,439 (76.25%) | 8,467 (76.50%)| 84,488 (76.34%) |
 
24
  ## KazNERD (cleaned)
25
  While the original dataset contained tokens denoting speech disfluencies and hesitations (parenthesised) and background noise [bracketed], this model was trained on a version of the dataset where such tokens and duplicates were removed.
26
  As a result, the number of sentences, tokens, and named entities (NEs) in the cleaned dataset changed.
27
+ ### Statistics for training (Train), validation (Valid), and test (Test) sets
 
28
  | Unit | Train | Valid | Test | Total |
29
  | :---: | :---: | :---: | :---: | :---: |
30
  | Sentence | 88,540 (80.00%) | 11,067 (10.00%) | 11,068 (10.00%) | 110,675 (100%) |
31
  | Token | 1,088,461 (80.04%) | 136,021 (10.00%) | 135,426 (9.96%) | 1,359,908 (100%) |
32
  | NE | 106,148 (80.17%) | 13,189 (9.96%) | 13,072 (9.87%) | 132,409 (100%) |
33
+ ### 80 / 10 / 10 split
 
34
  |Representation| Train | Valid | Test | Total |
35
  | :---: | :---: | :---: | :---: | :---: |
36
  | **AID** | 67,582 (79.99%) | 8,439 (9.99%) | 8,467 (10.02%)| 84,488 (100%) |
 
40
  | **EID** | 260 (81.00%) | 27 (8.41%) | 34 (10.59%)| 321 (100%) |
41
  | **FID** | 9 (75.00%) | 1 (8.33%)| 2 (16.67%)| 12 (100%) |
42
  |**Total**| **88,540 (80.00%)** | **11,067 (10.00%)** | **11,068 (10.00%)** | **110,675 (100%)** |
43
+ ### Distribution of representations across sets
 
44
  |Representation| Train | Valid | Test | Total |
45
  | :---: | :---: | :---: | :---: | :---: |
46
  | **AID** | 67,582 (76.33%) | 8,439 (76.25%) | 8,467 (76.50%)| 84,488 (76.34%) |