liam168 commited on
Commit
62e45a0
1 Parent(s): a3b91b4

feat: "Female", "Sports", "Literature", "Campus" 4 Classification Model。

Browse files
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: zh
3
+ tags:
4
+ - exbert
5
+ license: apache-2.0
6
+ widget:
7
+ - text: "女人做得越纯粹,皮肤和身材就越好"
8
+ - text: "我喜欢篮球"
9
+ ---
10
+
11
+ # liam168/c4-zh-distilbert-base-uncased
12
+
13
+ ## Model description
14
+
15
+ 用 ["女性","体育","文学","校园"]4类数据训练的分类模型。
16
+
17
+ ## Overview
18
+
19
+ - **Language model**: DistilBERT
20
+ - **Model size**: 280M
21
+ - **Language**: Chinese
22
+
23
+ ## Example
24
+
25
+ ```python
26
+ >>> from transformers import DistilBertForSequenceClassification , AutoTokenizer, pipeline
27
+
28
+ >>> model_name = "liam168/c4-zh-distilbert-base-uncased"
29
+ >>> class_num = 4
30
+ >>> ts_texts = ["女人做得越纯粹,皮肤和身材就越好", "我喜欢篮球"]
31
+ >>> model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=class_num)
32
+ >>> tokenizer = AutoTokenizer.from_pretrained(model_name)
33
+ >>> classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
34
+ >>> classifier(ts_texts[0])
35
+ >>> classifier(ts_texts[1])
36
+
37
+ [{'label': 'Female', 'score': 0.9137857556343079}]
38
+ [{'label': 'Sports', 'score': 0.8206522464752197}]
39
+
40
+ ```
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "distilbert-base-uncased",
3
+ "activation": "gelu",
4
+ "architectures": [
5
+ "DistilBertForSequenceClassification"
6
+ ],
7
+ "attention_dropout": 0.1,
8
+ "dim": 768,
9
+ "dropout": 0.1,
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "Female",
13
+ "1": "Sports",
14
+ "2": "Literature",
15
+ "3": "Campus"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "label2id": {
19
+ "Female": 0,
20
+ "Sports": 1,
21
+ "Literature": 2,
22
+ "Campus": 3
23
+ },
24
+ "max_position_embeddings": 512,
25
+ "model_type": "distilbert",
26
+ "n_heads": 12,
27
+ "n_layers": 6,
28
+ "pad_token_id": 0,
29
+ "problem_type": "single_label_classification",
30
+ "qa_dropout": 0.1,
31
+ "seq_classif_dropout": 0.2,
32
+ "sinusoidal_pos_embds": false,
33
+ "tie_weights_": true,
34
+ "torch_dtype": "float32",
35
+ "transformers_version": "4.9.0.dev0",
36
+ "vocab_size": 30522
37
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d599efd3f01521d7c4e278d0c6e8939cd2e882c595275d4e310f07a3039b4857
3
+ size 267866225
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "distilbert-base-uncased", "tokenizer_class": "DistilBertTokenizer"}
vocab.txt ADDED
The diff for this file is too large to render. See raw diff