Edit model card

Model Details

Model Description

The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data. This model is XLM-RoBERTa-large fine-tuned with the ner-wikipedia-dataset dataset in Japanese.

  • Developed by: See associated paper
  • Model type: Multi-lingual language NER model
  • Language(s) (NLP): Japanese
  • License: [More Information Needed]
  • Finetuned from model [optional]: XLM-RoBERTa-large

Each token is labeled using IO scheme with tags as follow:

Label id Tag Tag in Widget Description
0 O (None) others or nothing
1 PER PER person
2 ORG ORG general corporation organization
3 ORG-P P political organization
4 ORG-O O other organization
5 LOC LOC location
6 INS INS institution, facility
7 PRD PRD product
8 EVT EVT event

Training Details

See the following resources for training data and training procedure details:

Training procedure

Source code for fine-tuning is heavily inspired by this repository with a little bit of modification.

Training Hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05 (default)
  • train_batch_size: 12
  • eval_batch_size: 12
  • seed: 42 (default)
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 (default)
  • lr_scheduler_type: linear (default)
  • weight_decay = 0.01
  • num_epochs: 5

Evaluation

Training Loss Epoch Validation Loss Precision Recall F1 Accuracy
No log 1.0 0.1645 0.7581 0.8235 0.7894 0.9540
No log 2.0 0.1523 0.8153 0.8414 0.8281 0.9611
No log 3.0 0.1188 0.8416 0.8741 0.8575 0.9683
No log 4.0 0.1320 0.8621 0.8935 0.8775 0.9725
No log 5.0 0.1422 0.8796 0.9032 0.8913 0.9728

Testing Data, Factors & Metrics

Testing Data

XLM-RoBERTa-large model card with 9:1 train-eval ratio.

Downloads last month
11
Safetensors
Model size
559M params
Tensor type
F32
·

Dataset used to train rizkyfoxcale/xlm-roberta-large-ner-ja