sabarinathan
Update README.md
be2a0d9 verified
|
raw
history blame
3.38 kB

library_name: transformers license: apache-2.0 language:

  • ja base_model:
  • FacebookAI/xlm-roberta-base

Japanese Named Entity Recognition (NER)

This model is built using XLM-RoBERTa for Japanese text to recognize named entities such as persons, organizations, locations, and other categories. The model is designed specifically for Japanese text and can be used for a variety of tasks that require entity extraction from Japanese documents or conversations.

Table of Contents

Overview

Named Entity Recognition (NER) is a critical task in natural language processing (NLP) for identifying and classifying entities in text. This model recognizes named entities in Japanese, making it ideal for use in applications like document analysis, chatbots, or information retrieval in the Japanese language.

NER Tags

The model identifies the following tags:

Class ID Tag Description
0 O Outside any entity
1 PER Person names
2 ORG Organizations
3 ORG-P Political orgs
4 ORG-O Other orgs
5 LOC Locations
6 INS Institutions
7 PRD Products
8 EVT Events

Model Details

  • Base Model: xlm-roberta-base
  • Task: Token Classification (NER)
  • Languages: Japanese
  • Input: Japanese text
  • Output: Tokenized text with NER tags

Sample Input and Output

Here’s an example input sentence and the expected NER output.

Input

中国では、中国共産党による一党統治が続く。

Output

Token Predicted Tag
中国 LOC
では O
O
中国 ORG-P
共産党 ORG-P
による O
一党 O
統治 O
O
続く O
O

Visualization with Gradio and spaCy

The NER output is also visualized in color-coded format for ease of interpretation:

Entities Output:

  • LOC (Location): China (中国)
  • ORG-P (Political Organization): Chinese Communist Party (中国共産党)

Here’s the updated README section with the class names replacing the class IDs:


Model Performance Metrics

The following performance metrics were achieved by the model during evaluation:

Overall Metrics:

  • Total Accuracy: 98.42%
  • Total F1-score: 99.33%

Class-wise Metrics:

Class Recall Precision
O 99.94% 99.00%
PER 97.53% 98.80%
ORG 99.22% 96.23%
ORG-P 95.30% 99.71%
ORG-O 97.80% 98.26%
LOC 99.03% 96.71%
INS 98.88% 99.07%
PRD 99.31% 99.67%
EVT 98.96% 98.31%

The model demonstrates strong overall performance, with particularly high F1-scores and balanced class-wise precision and recall values.