Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Punctuator for Simplified Chinese

The model is fine-tuned based on DistilBertForTokenClassification for adding punctuations to plain text (simplified Chinese). The model is fine-tuned based on distilled model bert-base-chinese.

Usage

from transformers import DistilBertForTokenClassification, DistilBertTokenizerFast

model = DistilBertForTokenClassification.from_pretrained("Qishuai/distilbert_punctuator_zh")
tokenizer = DistilBertTokenizerFast.from_pretrained("Qishuai/distilbert_punctuator_zh")

Model Overview

Training data

Combination of following three dataset:

  • News articles of People's Daily 2014. Reference

Model Performance

  • Validation with MSRA training dataset. Reference
  • Metrics Report:
    precision recall f1-score support
    C_COMMA 0.67 0.59 0.63 91566
    C_DUNHAO 0.50 0.37 0.42 21013
    C_EXLAMATIONMARK 0.23 0.06 0.09 399
    C_PERIOD 0.84 0.99 0.91 44258
    C_QUESTIONMARK 0.00 1.00 0.00 0
    micro avg 0.71 0.67 0.69 157236
    macro avg 0.45 0.60 0.41 157236
    weighted avg 0.69 0.67 0.68 157236
Downloads last month
18
Hosted inference API
Token Classification
Examples
Examples
This model can be loaded on the Inference API on-demand.