library_name: transformers license: apache-2.0 language: - ja base_model: - FacebookAI/xlm-roberta-base # Japanese Named Entity Recognition (NER) This model is built using XLM-RoBERTa for Japanese text to recognize named entities such as persons, organizations, locations, and other categories. The model is designed specifically for Japanese text and can be used for a variety of tasks that require entity extraction from Japanese documents or conversations. ## Table of Contents - [Overview](#overview) - [NER Tags](#ner-tags) - [Model Details](#model-details) - [Sample Input and Output](#sample-input-and-output) ## Overview Named Entity Recognition (NER) is a critical task in natural language processing (NLP) for identifying and classifying entities in text. This model recognizes named entities in Japanese, making it ideal for use in applications like document analysis, chatbots, or information retrieval in the Japanese language. ## NER Tags The model identifies the following tags: | Class ID | Tag | Description | |----------|-------|----------------------| | 0 | O | Outside any entity | | 1 | PER | Person names | | 2 | ORG | Organizations | | 3 | ORG-P | Political orgs | | 4 | ORG-O | Other orgs | | 5 | LOC | Locations | | 6 | INS | Institutions | | 7 | PRD | Products | | 8 | EVT | Events | ## Model Details - **Base Model**: `xlm-roberta-base` - **Task**: Token Classification (NER) - **Languages**: Japanese - **Input**: Japanese text - **Output**: Tokenized text with NER tags ## Sample Input and Output Here’s an example input sentence and the expected NER output. ### **Input** ```text 中国では、中国共産党による一党統治が続く。 ``` ### **Output** | Token | Predicted Tag | |---------|---------------| | 中国 | LOC | | では | O | | 、 | O | | 中国 | ORG-P | | 共産党 | ORG-P | | による | O | | 一党 | O | | 統治 | O | | が | O | | 続く | O | | 。 | O | ### Visualization with Gradio and spaCy The NER output is also visualized in color-coded format for ease of interpretation: **Entities Output:** - `LOC` (Location): China (中国) - `ORG-P` (Political Organization): Chinese Communist Party (中国共産党) Here’s the updated README section with the class names replacing the class IDs: --- ## Model Performance Metrics The following performance metrics were achieved by the model during evaluation: ### Overall Metrics: - **Total Accuracy**: 98.42% - **Total F1-score**: 99.33% ### Class-wise Metrics: | Class | Recall | Precision | |----------|-----------|-----------| | **O** | 99.94% | 99.00% | | **PER** | 97.53% | 98.80% | | **ORG** | 99.22% | 96.23% | | **ORG-P**| 95.30% | 99.71% | | **ORG-O**| 97.80% | 98.26% | | **LOC** | 99.03% | 96.71% | | **INS** | 98.88% | 99.07% | | **PRD** | 99.31% | 99.67% | | **EVT** | 98.96% | 98.31% | The model demonstrates strong overall performance, with particularly high F1-scores and balanced class-wise precision and recall values.