Commit
•
5073cca
1
Parent(s):
1665f8c
Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ datasets:
|
|
9 |
|
10 |
This is an [ELECTRA](https://github.com/google-research/electra) model pretrained on approximately 200M Japanese sentences extracted from the [mC4](https://huggingface.co/datasets/mc4) and finetuned by [spaCy v3](https://spacy.io/usage/v3) on [UD\_Japanese\_BCCWJ r2.8](https://universaldependencies.org/treebanks/ja_bccwj/index.html).
|
11 |
|
12 |
-
The base pretrain model is [megagonlabs/transformers-ud-japanese-electra-base-discrimininator](https://huggingface.co/megagonlabs/transformers-ud-japanese-electra-base-discriminator).
|
13 |
|
14 |
The entire spaCy v3 model is distributed as a python package named [`ja_ginza_electra`](https://pypi.org/project/ja-ginza-electra/) from PyPI along with [`GiNZA v5`](https://github.com/megagonlabs/ginza) which provides some custom pipeline components to recognize the Japanese bunsetu-phrase structures.
|
15 |
Try running it as below:
|
|
|
9 |
|
10 |
This is an [ELECTRA](https://github.com/google-research/electra) model pretrained on approximately 200M Japanese sentences extracted from the [mC4](https://huggingface.co/datasets/mc4) and finetuned by [spaCy v3](https://spacy.io/usage/v3) on [UD\_Japanese\_BCCWJ r2.8](https://universaldependencies.org/treebanks/ja_bccwj/index.html).
|
11 |
|
12 |
+
The base pretrain model is [megagonlabs/transformers-ud-japanese-electra-base-discrimininator](https://huggingface.co/megagonlabs/transformers-ud-japanese-electra-base-discriminator), and requires [SudachiTra](https://github.com/WorksApplications/SudachiTra) for tokenization.
|
13 |
|
14 |
The entire spaCy v3 model is distributed as a python package named [`ja_ginza_electra`](https://pypi.org/project/ja-ginza-electra/) from PyPI along with [`GiNZA v5`](https://github.com/megagonlabs/ginza) which provides some custom pipeline components to recognize the Japanese bunsetu-phrase structures.
|
15 |
Try running it as below:
|