iioSnail commited on
Commit
125d238
1 Parent(s): 4242c1f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -1,3 +1,57 @@
1
  ---
2
  license: afl-3.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: afl-3.0
3
+ language:
4
+ - zh
5
  ---
6
+
7
+ # 中文词语分类
8
+
9
+ 本模型对中文词语进行分类(多标签)。对于一个中文词语,其会被分为一个或多个类别,类别有如下:
10
+
11
+ ```
12
+ "1": "人文科学",
13
+ "2": "农林渔畜",
14
+ "3": "医学",
15
+ "4": "城市信息大全",
16
+ "5": "娱乐",
17
+ "6": "工程与应用科学",
18
+ "7": "生活",
19
+ "8": "电子游戏",
20
+ "9": "社会科学",
21
+ "10": "自然科学",
22
+ "11": "艺术",
23
+ "12": "运动休闲"
24
+ ```
25
+
26
+ > 类别来源于[搜狗词汇的类型](https://pinyin.sogou.com/dict/cate/index/167)
27
+
28
+ # 使用样例
29
+
30
+ ```python
31
+ import torch
32
+ from transformers import AutoTokenizer, BertForSequenceClassification
33
+
34
+ model_path = "iioSnail/bert-base-chinese-word-classifier"
35
+
36
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
37
+ model = BertForSequenceClassification.from_pretrained(model_path)
38
+
39
+ words = ["2型糖尿病", "太古里", "跑跑卡丁车", "河豚"]
40
+ inputs = tokenizer(words, return_tensors='pt', padding=True)
41
+ outputs = model(**inputs).logits
42
+ outputs = outputs.sigmoid()
43
+ preds = outputs > 0.5
44
+ for i, pred in enumerate(preds):
45
+ pred = torch.argwhere(pred).view(-1)
46
+ labels = [model.config.id2label[int(id)] for id in pred]
47
+ print(words[i], ":", labels)
48
+ ```
49
+
50
+ 输出:
51
+
52
+ ```
53
+ 2型糖尿病 : ['医学']
54
+ 太古里 : ['城市信息大全']
55
+ 跑跑卡丁车 : ['电子游戏']
56
+ 河豚 : ['人文科学', '娱乐', '电子游戏', '自然科学']
57
+ ```