Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

繁體版中文錯別字校正模型

訓練資料

訓練技巧

  • 輸入句子長度需呈現常態分佈,錯字控制1~3個字元之間
  • 引入FocalLoss將偵測錯別字視作物件偵測
  • 輸出EntropyLoss與FocalLoss比重7:3

SIGHAN驗證分數

模型 準確度 精確度 召回率 F1分數
chinese-macbert-base 0.88 0.09 0.31 0.14
macbert4csc-base-chinese輸出簡轉繁 0.99 0.79 0.95 0.86
macbert4csc-traditional-chinese 1 0.9 0.99 0.94

NLG驗證分數

模型 準確度 精確度 召回率 F1分數
chinese-macbert-base 0.85 0.08 0.31 0.13
macbert4csc-base-chinese輸出簡轉繁 0.98 0.7 0.95 0.81
macbert4csc-traditional-chinese 0.99 0.8 0.99 0.89

誠摯感謝原作者XuMing開源研究成果

Downloads last month
21
Safetensors
Model size
102M params
Tensor type
F32
·

Datasets used to train Chuboy/macbert4csc-traditional-chinese