metadata
language: ja
tags:
- t5
- text2text-generation
- seq2seq
license: apache-2.0
datasets:
- mc4
- wiki40b
t5-base-japanese-web (with Byte-fallback)
Description
megagonlabs/t5-base-japanese-web is a T5 (Text-to-Text Transfer Transformer) model pre-trained on Japanese web texts.
Training codes are available on GitHub.
Corpus
- Japanese in mC4/3.0.1
- Japanese in wiki40b/1.3.0
Tokeniser
SentencePiece trained on Japanese Wikipedia
- Vocabulary size: 32,000
- Byte-fallback: Enabled
Parameters
- T5 model: models/t5.1.1.base.gin
- Training steps: 1,000,000
Related models
License
Apache License 2.0