hoodiexxx commited on
Commit
190cc68
1 Parent(s): ef00e45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -39
README.md CHANGED
@@ -6,48 +6,39 @@ pipeline_tag: text-classification
6
  ---
7
  # Bert Chinese Text Classification Model
8
  this a Bert Model that train for customer service of logistics companies
 
 
 
 
 
 
 
 
 
 
9
 
10
  ## Word Label(word, index, number of occurences)
11
  ```sh
12
- 我 1 18719
13
-
14
- 2 12236
15
-
16
- 3 8152
17
-
18
- 4 8097
19
-
20
- 5 7295
21
-
22
- 6 7118
23
-
24
- 7 6923
25
-
26
- 8 6684
27
-
28
- 9 6632
29
-
30
- 10 6434
31
-
32
- 你 11 5144
33
-
34
- 没 12 4989
35
-
36
- 有 13 4664
37
-
38
- 下 14 4433
39
-
40
- 这 15 4219
41
-
42
- 在 16 4219
43
-
44
- 么 17 4010
45
-
46
- 查 18 3964
47
-
48
- 就 19 3570
49
-
50
- 好 20 3524
51
  ```
52
 
53
  ## Tokenizer
 
6
  ---
7
  # Bert Chinese Text Classification Model
8
  this a Bert Model that train for customer service of logistics companies
9
+ ### data(with noise since it from ASR text)
10
+ train: 10878 rows
11
+ dev:2720 rows
12
+ total: 13598 rows
13
+ ### param
14
+ embed_dim: 128
15
+ batch size: 64
16
+ contextsize: 20
17
+ n_head: 2
18
+ epoches: 100
19
 
20
  ## Word Label(word, index, number of occurences)
21
  ```sh
22
+ 我 1 18719
23
+ 个 2 12236
24
+ 3 8152
25
+ 一 4 8097
26
+ 5 7295
27
+ 那 6 7118
28
+ 7 6923
29
+ 的 8 6684
30
+ 9 6632
31
+ 到 10 6434
32
+ 11 5144
33
+ 没 12 4989
34
+ 13 4664
35
+ 下 14 4433
36
+ 15 4219
37
+ 在 16 4219
38
+ 17 4010
39
+ 查 18 3964
40
+ 19 3570
41
+ 好 20 3524
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ```
43
 
44
  ## Tokenizer