Edit model card

Model card for model ID

This is a T5 v1.1 model, pre-trained on a Japanese corpus.

Model details

T5 is a Transformer-based Encoder-Decoder model, now in v1.1, with the following improvements over the original T5.

  • GEGLU activation in feed-forward hidden layer, rather than ReLU - see https://arxiv.org/abs/2002.05202 .
  • Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
  • no parameter sharing between embedding and classifier layer
  • "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger d_model and smaller num_heads and d_ff.

This model is based on T5 v1.1. It was pre-trained on a Japanese corpus. For the Japanese corpus, Japanese Wikipedia and mC4/ja were used.

Model Description

  • Developed by: Retrieva, Inc.
  • Model type: T5 v1.1
  • Language(s) (NLP): Japanese
  • License: CC-BY-SA 4.0 Although commercial use is permitted, we kindly request that you contact us beforehand.

Training Details

We use T5X (https://github.com/google-research/t5x) for the training of this model, and it has been converted to the Huggingface transformer format.

Training Data

The training data used is

  • The Japanese part of the multilingual C4(mC4/ja).
  • Japanese Wikipedia(20220920).

Preprocessing

The following filtering is done

  • Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese.
  • Whitelist-style filtering using the top level domain of URL to remove affiliate sites.

Training Hyperparameters

Speeds, Sizes, Times

We trained 524288 steps.

Technical Specifications

Model Architecture and Objective

Model architecture.

Compute Infrastructure

Google Cloud TPU v3-32.

Software

More Information

https://note.com/retrieva/n/n7b4186dc5ada (in Japanese)

Model Card Authors

Jiro Nishitoba

Model Card Contact

pr@retrieva.jp

Downloads last month
148