Zhou Yucheng commited on
Commit
b44e3a3
β€’
1 Parent(s): b247ff3

readme.md update

Browse files
Files changed (1) hide show
  1. README.md +82 -55
README.md CHANGED
@@ -3,78 +3,105 @@ license: apache-2.0
3
  language: en
4
  tags:
5
  - generated_from_trainer
 
6
  metrics:
7
  - precision
8
  - recall
9
  - f1
10
  - accuracy
11
- model-index:
12
- - name: macbert-finetuned-tokenclassification-errorword
13
- results: []
14
  ---
15
 
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
18
-
19
- # macbert-finetuned-tokenclassification-errorword
20
-
21
- This model is a fine-tuned version of [shibing624/macbert4csc-base-chinese](https://huggingface.co/shibing624/macbert4csc-base-chinese) on an unknown dataset.
22
- It achieves the following results on the evaluation set:
23
- - Loss: 0.0040
24
- - Precision: 0.0
25
- - Recall: 0.0
26
- - F1: 0.0
27
- - Accuracy: 0.9994
28
-
29
  ## Model description
30
 
31
- This model fine-tuned on a large corpus of medical material which processed on purpose, we propose to sample words and use similar words to do replacement for masking purpose.
32
- As a result, this model can performed pretty well when applying on medical relatted downstream tasks.
33
 
34
  ## Intended uses & limitations
35
 
36
  You can use this model directly with a pipeline for token classification:
37
  ```python
38
- from transformers import (AutoModelForTokenClassification, AutoTokenizer
39
- from transformers import pipeline
40
-
41
- hub_model_id = "9pinus/macbert-base-chinese-medical-collation"
42
-
43
- model = AutoModelForTokenClassification.from_pretrained(hub_model_id)
44
- tokenizer = BertTokenizer.from_pretrained(hub_model_id)
45
- classifier = pipeline('ner', model=model, tokenizer=tokenizer)
46
- result = classifier("ε¦‚ζžœη—…ζƒ…θΎƒι‡οΌŒε―ι€‚ε½“ε£ζœη”²η‘ε”‘η‰‡γ€ηŽ―ι…―ηΊ’ιœ‰η΄ η‰‡γ€ε²ε“šηΎŽθΎ›η‰‡η­‰θ―η‰©θΏ›θ‘ŒζŠ—ζ„ŸζŸ“ι•‡η—›γ€‚εŒζ—Άεœ¨ζ—₯εΈΈη”Ÿζ΄»δΈ­θ¦ζ³¨ζ„η‰™ι½ΏζΈ…ζ΄ε«η”ŸοΌŒε…»ζˆεˆ·η‰™ηš„ε₯½δΉ ζƒ―。")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
- for item in result:
49
- print(item)
50
  ```
51
 
52
- ### Training hyperparameters
53
-
54
- The following hyperparameters were used during training:
55
- - learning_rate: 5e-05
56
- - train_batch_size: 16
57
- - eval_batch_size: 16
58
- - seed: 42
59
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
60
- - lr_scheduler_type: linear
61
- - num_epochs: 8.0
62
- - mixed_precision_training: Native AMP
63
-
64
- ### Training results
65
-
66
- | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
67
- |:-------------:|:-----:|:------:|:---------------:|:---------:|:------:|:---:|:--------:|
68
- | 0.0038 | 1.0 | 36875 | 0.0030 | 0.0 | 0.0 | 0.0 | 0.9991 |
69
- | 0.0026 | 2.0 | 73750 | 0.0028 | 0.0 | 0.0 | 0.0 | 0.9992 |
70
- | 0.0021 | 3.0 | 110625 | 0.0033 | 0.0 | 0.0 | 0.0 | 0.9992 |
71
- | 0.0014 | 4.0 | 147500 | 0.0033 | 0.0 | 0.0 | 0.0 | 0.9993 |
72
- | 0.0009 | 5.0 | 184375 | 0.0033 | 0.0 | 0.0 | 0.0 | 0.9993 |
73
- | 0.0006 | 6.0 | 221250 | 0.0035 | 0.0 | 0.0 | 0.0 | 0.9994 |
74
- | 0.0004 | 7.0 | 258125 | 0.0037 | 0.0 | 0.0 | 0.0 | 0.9994 |
75
- | 0.0002 | 8.0 | 295000 | 0.0040 | 0.0 | 0.0 | 0.0 | 0.9994 |
76
-
77
-
78
  ### Framework versions
79
 
80
  - Transformers 4.15.0
 
3
  language: en
4
  tags:
5
  - generated_from_trainer
6
+ - Token Classification
7
  metrics:
8
  - precision
9
  - recall
10
  - f1
11
  - accuracy
 
 
 
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ## Model description
15
 
16
+ This model is a fine-tuned version of macbert for the purpose of spell checking in medical apllication scenarious, and we fine-tuned on our own medical data which accumulated during past several years including 600,000 fine edited medical articals. When processing the dataset, we proposed to sample 30% of these articals then randomly select characters and replace these words with spelling errors which are either visally or phonologically resembled characters. Consequently, the model can achieve 90% accuracy on our test dataset.
 
17
 
18
  ## Intended uses & limitations
19
 
20
  You can use this model directly with a pipeline for token classification:
21
  ```python
22
+ >>> from transformers import (AutoModelForTokenClassification, BertTokenizer)
23
+ >>> from transformers import pipeline
24
+
25
+ >>> hub_model_id = "9pinus/macbert-base-chinese-medical-collation"
26
+
27
+ >>> model = AutoModelForTokenClassification.from_pretrained(hub_model_id)
28
+ >>> tokenizer = BertTokenizer.from_pretrained(hub_model_id)
29
+ >>> classifier = pipeline('ner', model=model, tokenizer=tokenizer)
30
+ >>> result = classifier("ε¦‚ζžœη—…ζƒ…θΎƒι‡οΌŒε―ι€‚ε½“ε£ζœη”²θ‚–ε”‘η‰‡γ€ηŽ―ι…―ηΊ’ιœ‰η΄ η‰‡γ€ε²ε“šηΎŽθΎ›η‰‡η­‰θ―η‰©θΏ›θ‘ŒζŠ—ζ„ŸζŸ“ι•‡η—›γ€‚εŒζ—Άεœ¨ζ—₯εΈΈη”Ÿζ΄»δΈ­θ¦ζ³¨ζ„η‰™ι½ΏζΈ…ζ΄ε«η”ŸοΌŒε…»ζˆεˆ·η‰™ηš„ε₯½δΉ ζƒ―。")
31
+
32
+ >>> for item in result:
33
+ >>> print(item)
34
+
35
+ {'entity': 0, 'score': 0.9999982, 'index': 1, 'word': '如', 'start': None, 'end': None}
36
+ {'entity': 0, 'score': 0.99999845, 'index': 2, 'word': '果', 'start': None, 'end': None}
37
+ {'entity': 0, 'score': 0.99999845, 'index': 3, 'word': 'η—…', 'start': None, 'end': None}
38
+ {'entity': 0, 'score': 0.99999857, 'index': 4, 'word': 'ζƒ…', 'start': None, 'end': None}
39
+ {'entity': 0, 'score': 0.99999845, 'index': 5, 'word': 'θΎƒ', 'start': None, 'end': None}
40
+ {'entity': 0, 'score': 0.99999845, 'index': 6, 'word': '重', 'start': None, 'end': None}
41
+ {'entity': 0, 'score': 0.99999833, 'index': 7, 'word': ',', 'start': None, 'end': None}
42
+ {'entity': 0, 'score': 0.99999845, 'index': 8, 'word': '可', 'start': None, 'end': None}
43
+ {'entity': 0, 'score': 0.99999845, 'index': 9, 'word': '适', 'start': None, 'end': None}
44
+ {'entity': 0, 'score': 0.99999845, 'index': 10, 'word': '当', 'start': None, 'end': None}
45
+ {'entity': 0, 'score': 0.99999845, 'index': 11, 'word': '口', 'start': None, 'end': None}
46
+ {'entity': 0, 'score': 0.99999845, 'index': 12, 'word': '服', 'start': None, 'end': None}
47
+ {'entity': 0, 'score': 0.9999982, 'index': 13, 'word': 'η”²', 'start': None, 'end': None}
48
+ {'entity': 1, 'score': 0.901703, 'index': 14, 'word': 'θ‚–', 'start': None, 'end': None}
49
+ {'entity': 0, 'score': 0.99999833, 'index': 15, 'word': 'ε”‘', 'start': None, 'end': None}
50
+ {'entity': 0, 'score': 0.99999845, 'index': 16, 'word': '片', 'start': None, 'end': None}
51
+ {'entity': 0, 'score': 0.99999845, 'index': 17, 'word': '、', 'start': None, 'end': None}
52
+ {'entity': 0, 'score': 0.99999845, 'index': 18, 'word': '环', 'start': None, 'end': None}
53
+ {'entity': 0, 'score': 0.99999845, 'index': 19, 'word': 'ι…―', 'start': None, 'end': None}
54
+ {'entity': 0, 'score': 0.99999845, 'index': 20, 'word': 'ηΊ’', 'start': None, 'end': None}
55
+ {'entity': 0, 'score': 0.99999845, 'index': 21, 'word': 'ιœ‰', 'start': None, 'end': None}
56
+ {'entity': 0, 'score': 0.99999845, 'index': 22, 'word': 'η΄ ', 'start': None, 'end': None}
57
+ {'entity': 0, 'score': 0.99999845, 'index': 23, 'word': '片', 'start': None, 'end': None}
58
+ {'entity': 0, 'score': 0.99999845, 'index': 24, 'word': '、', 'start': None, 'end': None}
59
+ {'entity': 0, 'score': 0.99999845, 'index': 25, 'word': '吲', 'start': None, 'end': None}
60
+ {'entity': 0, 'score': 0.99999833, 'index': 26, 'word': 'ε“š', 'start': None, 'end': None}
61
+ {'entity': 0, 'score': 0.999998, 'index': 27, 'word': '美', 'start': None, 'end': None}
62
+ {'entity': 0, 'score': 0.99999833, 'index': 28, 'word': 'θΎ›', 'start': None, 'end': None}
63
+ {'entity': 0, 'score': 0.99999845, 'index': 29, 'word': '片', 'start': None, 'end': None}
64
+ {'entity': 0, 'score': 0.99999833, 'index': 30, 'word': 'η­‰', 'start': None, 'end': None}
65
+ {'entity': 0, 'score': 0.99999845, 'index': 31, 'word': '药', 'start': None, 'end': None}
66
+ {'entity': 0, 'score': 0.99999845, 'index': 32, 'word': '物', 'start': None, 'end': None}
67
+ {'entity': 0, 'score': 0.99999833, 'index': 33, 'word': 'θΏ›', 'start': None, 'end': None}
68
+ {'entity': 0, 'score': 0.99999845, 'index': 34, 'word': '葌', 'start': None, 'end': None}
69
+ {'entity': 0, 'score': 0.99999845, 'index': 35, 'word': 'ζŠ—', 'start': None, 'end': None}
70
+ {'entity': 0, 'score': 0.99999845, 'index': 36, 'word': 'ζ„Ÿ', 'start': None, 'end': None}
71
+ {'entity': 0, 'score': 0.99999857, 'index': 37, 'word': 'ζŸ“', 'start': None, 'end': None}
72
+ {'entity': 0, 'score': 0.99999845, 'index': 38, 'word': '镇', 'start': None, 'end': None}
73
+ {'entity': 0, 'score': 0.99999857, 'index': 39, 'word': 'η—›', 'start': None, 'end': None}
74
+ {'entity': 0, 'score': 0.99999833, 'index': 40, 'word': '。', 'start': None, 'end': None}
75
+ {'entity': 0, 'score': 0.99999845, 'index': 41, 'word': '同', 'start': None, 'end': None}
76
+ {'entity': 0, 'score': 0.99999845, 'index': 42, 'word': 'ζ—Ά', 'start': None, 'end': None}
77
+ {'entity': 0, 'score': 0.99999833, 'index': 43, 'word': '在', 'start': None, 'end': None}
78
+ {'entity': 0, 'score': 0.99999845, 'index': 44, 'word': 'ζ—₯', 'start': None, 'end': None}
79
+ {'entity': 0, 'score': 0.99999857, 'index': 45, 'word': 'εΈΈ', 'start': None, 'end': None}
80
+ {'entity': 0, 'score': 0.99999845, 'index': 46, 'word': 'η”Ÿ', 'start': None, 'end': None}
81
+ {'entity': 0, 'score': 0.99999845, 'index': 47, 'word': 'ζ΄»', 'start': None, 'end': None}
82
+ {'entity': 0, 'score': 0.99999845, 'index': 48, 'word': 'δΈ­', 'start': None, 'end': None}
83
+ {'entity': 0, 'score': 0.99999845, 'index': 49, 'word': '要', 'start': None, 'end': None}
84
+ {'entity': 0, 'score': 0.99999845, 'index': 50, 'word': '注', 'start': None, 'end': None}
85
+ {'entity': 0, 'score': 0.99999857, 'index': 51, 'word': '意', 'start': None, 'end': None}
86
+ {'entity': 0, 'score': 0.99999845, 'index': 52, 'word': '牙', 'start': None, 'end': None}
87
+ {'entity': 0, 'score': 0.99999845, 'index': 53, 'word': 'ι½Ώ', 'start': None, 'end': None}
88
+ {'entity': 0, 'score': 0.99999857, 'index': 54, 'word': 'ζΈ…', 'start': None, 'end': None}
89
+ {'entity': 0, 'score': 0.99999857, 'index': 55, 'word': '洁', 'start': None, 'end': None}
90
+ {'entity': 0, 'score': 0.99999857, 'index': 56, 'word': '卫', 'start': None, 'end': None}
91
+ {'entity': 0, 'score': 0.99999857, 'index': 57, 'word': 'η”Ÿ', 'start': None, 'end': None}
92
+ {'entity': 0, 'score': 0.99999845, 'index': 58, 'word': ',', 'start': None, 'end': None}
93
+ {'entity': 0, 'score': 0.99999845, 'index': 59, 'word': 'ε…»', 'start': None, 'end': None}
94
+ {'entity': 0, 'score': 0.99999857, 'index': 60, 'word': '成', 'start': None, 'end': None}
95
+ {'entity': 0, 'score': 0.99999857, 'index': 61, 'word': '刷', 'start': None, 'end': None}
96
+ {'entity': 0, 'score': 0.99999857, 'index': 62, 'word': '牙', 'start': None, 'end': None}
97
+ {'entity': 0, 'score': 0.99999845, 'index': 63, 'word': 'ηš„', 'start': None, 'end': None}
98
+ {'entity': 0, 'score': 0.99999845, 'index': 64, 'word': 'ε₯½', 'start': None, 'end': None}
99
+ {'entity': 0, 'score': 0.99999845, 'index': 65, 'word': 'δΉ ', 'start': None, 'end': None}
100
+ {'entity': 0, 'score': 0.99999857, 'index': 66, 'word': 'ζƒ―', 'start': None, 'end': None}
101
+ {'entity': 0, 'score': 0.99999833, 'index': 67, 'word': '。', 'start': None, 'end': None}
102
 
 
 
103
  ```
104
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  ### Framework versions
106
 
107
  - Transformers 4.15.0