library_name: transformers license: apache-2.0 language:
- ja base_model:
- FacebookAI/xlm-roberta-base
Japanese Named Entity Recognition (NER)
This model is built using XLM-RoBERTa for Japanese text to recognize named entities such as persons, organizations, locations, and other categories. The model is designed specifically for Japanese text and can be used for a variety of tasks that require entity extraction from Japanese documents or conversations.
Table of Contents
Overview
Named Entity Recognition (NER) is a critical task in natural language processing (NLP) for identifying and classifying entities in text. This model recognizes named entities in Japanese, making it ideal for use in applications like document analysis, chatbots, or information retrieval in the Japanese language.
NER Tags
The model identifies the following tags:
Class ID | Tag | Description |
---|---|---|
0 | O | Outside any entity |
1 | PER | Person names |
2 | ORG | Organizations |
3 | ORG-P | Political orgs |
4 | ORG-O | Other orgs |
5 | LOC | Locations |
6 | INS | Institutions |
7 | PRD | Products |
8 | EVT | Events |
Model Details
- Base Model:
xlm-roberta-base
- Task: Token Classification (NER)
- Languages: Japanese
- Input: Japanese text
- Output: Tokenized text with NER tags
Sample Input and Output
Here’s an example input sentence and the expected NER output.
Input
中国では、中国共産党による一党統治が続く。
Output
Token | Predicted Tag |
---|---|
中国 | LOC |
では | O |
、 | O |
中国 | ORG-P |
共産党 | ORG-P |
による | O |
一党 | O |
統治 | O |
が | O |
続く | O |
。 | O |
Visualization with Gradio and spaCy
The NER output is also visualized in color-coded format for ease of interpretation:
Entities Output:
LOC
(Location): China (中国)ORG-P
(Political Organization): Chinese Communist Party (中国共産党)
Here’s the updated README section with the class names replacing the class IDs:
Model Performance Metrics
The following performance metrics were achieved by the model during evaluation:
Overall Metrics:
- Total Accuracy: 98.42%
- Total F1-score: 99.33%
Class-wise Metrics:
Class | Recall | Precision |
---|---|---|
O | 99.94% | 99.00% |
PER | 97.53% | 98.80% |
ORG | 99.22% | 96.23% |
ORG-P | 95.30% | 99.71% |
ORG-O | 97.80% | 98.26% |
LOC | 99.03% | 96.71% |
INS | 98.88% | 99.07% |
PRD | 99.31% | 99.67% |
EVT | 98.96% | 98.31% |
The model demonstrates strong overall performance, with particularly high F1-scores and balanced class-wise precision and recall values.