--- license: afl-3.0 language: - zh --- # 中文词语分类 本模型对中文词语进行分类(多标签)。对于一个中文词语,其会被分为一个或多个类别,类别有如下: ``` "1": "人文科学", "2": "农林渔畜", "3": "医学", "4": "城市信息大全", "5": "娱乐", "6": "工程与应用科学", "7": "生活", "8": "电子游戏", "9": "社会科学", "10": "自然科学", "11": "艺术", "12": "运动休闲" ``` > 类别来源于[搜狗词汇的类型](https://pinyin.sogou.com/dict/cate/index/167) # 使用样例 ```python import torch from transformers import AutoTokenizer, BertForSequenceClassification model_path = "iioSnail/bert-base-chinese-word-classifier" tokenizer = AutoTokenizer.from_pretrained(model_path) model = BertForSequenceClassification.from_pretrained(model_path) words = ["2型糖尿病", "太古里", "跑跑卡丁车", "河豚"] inputs = tokenizer(words, return_tensors='pt', padding=True) outputs = model(**inputs).logits outputs = outputs.sigmoid() preds = outputs > 0.5 for i, pred in enumerate(preds): pred = torch.argwhere(pred).view(-1) labels = [model.config.id2label[int(id)] for id in pred] print(words[i], ":", labels) ``` 输出: ``` 2型糖尿病 : ['医学'] 太古里 : ['城市信息大全'] 跑跑卡丁车 : ['电子游戏'] 河豚 : ['人文科学', '娱乐', '电子游戏', '自然科学'] ```