Japanese Corpus Queries

#1
by ShortText - opened

The Japanese corpus which T5 or MT5 is using, is it based on Kanji texts or everything mixed (Kanji, Katakana, hirangana) ?

I used mixed data.
The data used were the Japanese dump data from Wikipedia, the Japanese corpus from OSCAR, and the Japanese corpus from CC-100.

sonoisa changed discussion status to closed

Sign up or log in to comment