9pinus
/

macbert-base-chinese-medical-collation

@@ -1,8 +1,7 @@
 ---
 license: apache-2.0
-language: en
 tags:
-- generated_from_trainer
 - Token Classification
 metrics:
 - precision
@@ -13,92 +12,27 @@ metrics:
 ## Model description
-This model is a fine-tuned version of macbert for the purpose of spell checking in medical apllication scenarious, and we fine-tuned on our own medical data which accumulated during past several years including 600,000 fine edited medical articals. When processing the dataset, we proposed to sample 30% of these articals then randomly select characters and replace these words with spelling errors which are either visally or phonologically resembled characters. Consequently, the model can achieve 90% accuracy on our test dataset.
 ## Intended uses & limitations
 You can use this model directly with a pipeline for token classification:
 ```python
->>> from transformers import (AutoModelForTokenClassification, BertTokenizer)
 >>> from transformers import pipeline
 >>> hub_model_id = "9pinus/macbert-base-chinese-medical-collation"
 >>> model = AutoModelForTokenClassification.from_pretrained(hub_model_id)
->>> tokenizer = BertTokenizer.from_pretrained(hub_model_id)
 >>> classifier = pipeline('ner', model=model, tokenizer=tokenizer)
->>> result = classifier("如果病情较重，可适当口服甲肖唑片、环酯红霉素片、吲哚美辛片等药物进行抗感染镇痛。同时在日常生活中要注意牙齿清洁卫生，养成刷牙的好习惯。")
 >>> for item in result:
->>>     print(item)
-{'entity': 0, 'score': 0.9999982, 'index': 1, 'word': '如', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 2, 'word': '果', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 3, 'word': '病', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 4, 'word': '情', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 5, 'word': '较', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 6, 'word': '重', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999833, 'index': 7, 'word': '，', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 8, 'word': '可', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 9, 'word': '适', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 10, 'word': '当', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 11, 'word': '口', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 12, 'word': '服', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.9999982, 'index': 13, 'word': '甲', 'start': None, 'end': None}
-{'entity': 1, 'score': 0.901703, 'index': 14, 'word': '肖', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999833, 'index': 15, 'word': '唑', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 16, 'word': '片', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 17, 'word': '、', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 18, 'word': '环', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 19, 'word': '酯', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 20, 'word': '红', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 21, 'word': '霉', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 22, 'word': '素', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 23, 'word': '片', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 24, 'word': '、', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 25, 'word': '吲', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999833, 'index': 26, 'word': '哚', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.999998, 'index': 27, 'word': '美', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999833, 'index': 28, 'word': '辛', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 29, 'word': '片', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999833, 'index': 30, 'word': '等', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 31, 'word': '药', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 32, 'word': '物', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999833, 'index': 33, 'word': '进', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 34, 'word': '行', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 35, 'word': '抗', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 36, 'word': '感', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 37, 'word': '染', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 38, 'word': '镇', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 39, 'word': '痛', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999833, 'index': 40, 'word': '。', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 41, 'word': '同', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 42, 'word': '时', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999833, 'index': 43, 'word': '在', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 44, 'word': '日', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 45, 'word': '常', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 46, 'word': '生', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 47, 'word': '活', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 48, 'word': '中', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 49, 'word': '要', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 50, 'word': '注', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 51, 'word': '意', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 52, 'word': '牙', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 53, 'word': '齿', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 54, 'word': '清', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 55, 'word': '洁', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 56, 'word': '卫', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 57, 'word': '生', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 58, 'word': '，', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 59, 'word': '养', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 60, 'word': '成', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 61, 'word': '刷', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 62, 'word': '牙', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 63, 'word': '的', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 64, 'word': '好', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999845, 'index': 65, 'word': '习', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999857, 'index': 66, 'word': '惯', 'start': None, 'end': None}
-{'entity': 0, 'score': 0.99999833, 'index': 67, 'word': '。', 'start': None, 'end': None}
 ```

 ---
 license: apache-2.0
+language: zh
 tags:
 - Token Classification
 metrics:
 - precision
 ## Model description
+This model is a fine-tuned version of macbert for the purpose of spell checking in medical application scenarios. We fine-tuned macbert Chinese base version on a 300M dataset including 60K+ authorized medical articles. We proposed to randomly confuse 30% sentences of these articles by adding noise with a either visually or phonologically resembled characters. Consequently, the fine-tuned model can achieve 96% accuracy on our test dataset.
 ## Intended uses & limitations
 You can use this model directly with a pipeline for token classification:
 ```python
+>>> from transformers import (AutoModelForTokenClassification, AutoTokenizer)
 >>> from transformers import pipeline
 >>> hub_model_id = "9pinus/macbert-base-chinese-medical-collation"
 >>> model = AutoModelForTokenClassification.from_pretrained(hub_model_id)
+>>> tokenizer = AutoTokenizer.from_pretrained(hub_model_id)
 >>> classifier = pipeline('ner', model=model, tokenizer=tokenizer)
+>>> result = classifier("如果病情较重，可适当口服甲肖唑片、环酯红霉素片等药物进行抗感染镇痛。")
 >>> for item in result:
+>>>     if item['entity'] == 1:
+>>>         print(item)
+{'entity': 1, 'score': 0.58127016, 'index': 14, 'word': '肖', 'start': 13, 'end': 14}
 ```