Update README.md
Browse files
README.md
CHANGED
@@ -6,48 +6,39 @@ pipeline_tag: text-classification
|
|
6 |
---
|
7 |
# Bert Chinese Text Classification Model
|
8 |
this a Bert Model that train for customer service of logistics companies
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
10 |
## Word Label(word, index, number of occurences)
|
11 |
```sh
|
12 |
-
我 1 18719
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
你 11 5144
|
33 |
-
|
34 |
-
没 12 4989
|
35 |
-
|
36 |
-
有 13 4664
|
37 |
-
|
38 |
-
下 14 4433
|
39 |
-
|
40 |
-
这 15 4219
|
41 |
-
|
42 |
-
在 16 4219
|
43 |
-
|
44 |
-
么 17 4010
|
45 |
-
|
46 |
-
查 18 3964
|
47 |
-
|
48 |
-
就 19 3570
|
49 |
-
|
50 |
-
好 20 3524
|
51 |
```
|
52 |
|
53 |
## Tokenizer
|
|
|
6 |
---
|
7 |
# Bert Chinese Text Classification Model
|
8 |
this a Bert Model that train for customer service of logistics companies
|
9 |
+
### data(with noise since it from ASR text)
|
10 |
+
train: 10878 rows
|
11 |
+
dev:2720 rows
|
12 |
+
total: 13598 rows
|
13 |
+
### param
|
14 |
+
embed_dim: 128
|
15 |
+
batch size: 64
|
16 |
+
contextsize: 20
|
17 |
+
n_head: 2
|
18 |
+
epoches: 100
|
19 |
|
20 |
## Word Label(word, index, number of occurences)
|
21 |
```sh
|
22 |
+
我 1 18719
|
23 |
+
个 2 12236
|
24 |
+
快 3 8152
|
25 |
+
一 4 8097
|
26 |
+
递 5 7295
|
27 |
+
那 6 7118
|
28 |
+
了 7 6923
|
29 |
+
的 8 6684
|
30 |
+
是 9 6632
|
31 |
+
到 10 6434
|
32 |
+
你 11 5144
|
33 |
+
没 12 4989
|
34 |
+
有 13 4664
|
35 |
+
下 14 4433
|
36 |
+
这 15 4219
|
37 |
+
在 16 4219
|
38 |
+
么 17 4010
|
39 |
+
查 18 3964
|
40 |
+
就 19 3570
|
41 |
+
好 20 3524
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
```
|
43 |
|
44 |
## Tokenizer
|