Japanese Corpus Queries

by ShortText - opened Jun 24, 2022

Jun 24, 2022

The Japanese corpus which T5 or MT5 is using, is it based on Kanji texts or everything mixed (Kanji, Katakana, hirangana) ?

Owner Jun 28, 2022

•

I used mixed data.
The data used were the Japanese dump data from Wikipedia, the Japanese corpus from OSCAR, and the Japanese corpus from CC-100.

sonoisa changed discussion status to closed Jul 3, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment