sonoisa
/

byt5-small-japanese

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

sonoisa commited on Aug 26, 2021

Commit

b07675b

•

1 Parent(s): 2edc403

Add corpus size

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ datasets:
 This is a [ByT5 (a tokenizer-free extension of the Text-to-Text Transfer Transformer)](https://github.com/google-research/byt5/) model pretrained on Japanese corpus.
-次の日本語コーパスを用いて事前学習を行ったByT5 (a tokenizer-free extension of the Text-to-Text Transfer Transformer) モデルです。
 * [Wikipedia](https://ja.wikipedia.org)の日本語ダンプデータ (2020年7月6日時点のもの)
 * [OSCAR](https://oscar-corpus.com)の日本語コーパス

 This is a [ByT5 (a tokenizer-free extension of the Text-to-Text Transfer Transformer)](https://github.com/google-research/byt5/) model pretrained on Japanese corpus.
+次の日本語コーパス（約100GB）を用いて事前学習を行ったByT5 (a tokenizer-free extension of the Text-to-Text Transfer Transformer) モデルです。
 * [Wikipedia](https://ja.wikipedia.org)の日本語ダンプデータ (2020年7月6日時点のもの)
 * [OSCAR](https://oscar-corpus.com)の日本語コーパス