Edit model card
YAML Metadata Error: "datasets[1]" with value "UD_Japanese_BCCWJ r2.8" is not valid. If possible, use a dataset id from https://hf.co/datasets.
YAML Metadata Error: "datasets[2]" with value "GSK2014-A(2019)" is not valid. If possible, use a dataset id from https://hf.co/datasets.

transformers-ud-japanese-electra-ginza-510 (sudachitra-wordpiece, mC4 Japanese)

This is an ELECTRA model pretrained on approximately 200M Japanese sentences extracted from the mC4 and finetuned by spaCy v3 on UD_Japanese_BCCWJ r2.8.

The base pretrain model is megagonlabs/transformers-ud-japanese-electra-base-discrimininator.

The entire spaCy v3 model is distributed as a python package named ja_ginza_electra from PyPI along with GiNZA v5 which provides some custom pipeline components to recognize the Japanese bunsetu-phrase structures. Try running it as below:

$ pip install ginza ja_ginza_electra
$ ginza

Licenses

The models are distributed under the terms of the MIT License.

Acknowledgments

This model is permitted to be published under the MIT License under a joint research agreement between NINJAL (National Institute for Japanese Language and Linguistics) and Megagon Labs Tokyo.

Citations

Contains information from mC4 which is made available under the ODC Attribution License.

@article{2019t5,
    author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
    title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
    journal = {arXiv e-prints},
    year = {2019},
    archivePrefix = {arXiv},
    eprint = {1910.10683},
}
Asahara, M., Kanayama, H., Tanaka, T., Miyao, Y., Uematsu, S., Mori, S.,
Matsumoto, Y., Omura, M., & Murawaki, Y. (2018).
Universal Dependencies Version 2 for Japanese.
In LREC-2018.
Downloads last month
5,514
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.