ja_core_news_lg / README.md
osanseviero's picture
osanseviero HF staff
Update spaCy pipeline
822433e
metadata
tags:
  - spacy
  - token-classification
language:
  - ja
license: CC-BY-SA-4.0
model-index:
  - name: ja_core_news_lg
    results:
      - tasks:
          name: NER
          type: token-classification
          metrics:
            - name: Precision
              type: precision
              value: 0.760989011
            - name: Recall
              type: recall
              value: 0.7075351213
            - name: F Score
              type: f_score
              value: 0.7332892124
      - tasks:
          name: POS
          type: token-classification
          metrics:
            - name: Accuracy
              type: accuracy
              value: 0.9721899386
      - tasks:
          name: SENTER
          type: token-classification
          metrics:
            - name: Precision
              type: precision
              value: 0.9860557769
            - name: Recall
              type: recall
              value: 0.9880239521
            - name: F Score
              type: f_score
              value: 0.9870388833
      - tasks:
          name: UNLABELED_DEPENDENCIES
          type: token-classification
          metrics:
            - name: Accuracy
              type: accuracy
              value: 0.9181002928
      - tasks:
          name: LABELED_DEPENDENCIES
          type: token-classification
          metrics:
            - name: Accuracy
              type: accuracy
              value: 0.9181002928

Details: https://spacy.io/models/ja#ja_core_news_lg

Japanese pipeline optimized for CPU. Components: tok2vec, parser, senter, ner, attribute_ruler.

Feature Description
Name ja_core_news_lg
Version 3.1.0
spaCy >=3.1.0,<3.2.0
Default Pipeline tok2vec, parser, attribute_ruler, ner
Components tok2vec, parser, senter, attribute_ruler, ner
Vectors 480443 keys, 480443 unique vectors (300 dimensions)
Sources UD Japanese GSD v2.6 (Omura, Mai; Miyao, Yusuke; Kanayama, Hiroshi; Matsuda, Hiroshi; Wakasa, Aya; Yamashita, Kayo; Asahara, Masayuki; Tanaka, Takaaki; Murawaki, Yugo; Matsumoto, Yuji; Mori, Shinsuke; Uematsu, Sumire; McDonald, Ryan; Nivre, Joakim; Zeman, Daniel)
UD Japanese GSD v2.6 NER (Megagon Labs Tokyo)
chiVe: Japanese Word Embedding with Sudachi & NWJC (chive-1.1-mc90-500k) (Works Applications)
License CC BY-SA 4.0
Author Explosion

Label Scheme

View label scheme (47 labels for 3 components)
Component Labels
parser ROOT, acl, advcl, advmod, amod, aux, case, cc, ccomp, compound, cop, csubj, dep, det, dislocated, fixed, mark, nmod, nsubj, nummod, obj, obl, punct
senter I, S
ner CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW, LOC, MONEY, MOVEMENT, NORP, ORDINAL, ORG, PERCENT, PERSON, PET_NAME, PHONE, PRODUCT, QUANTITY, TIME, TITLE_AFFIX, WORK_OF_ART

Accuracy

Type Score
TOKEN_ACC 99.69
TAG_ACC 97.22
POS_ACC 96.40
MORPH_ACC 0.00
DEP_UAS 91.81
DEP_LAS 89.98
ENTS_P 76.10
ENTS_R 70.75
ENTS_F 73.33
SENTS_P 98.61
SENTS_R 98.80
SENTS_F 98.70