Dataset

#8
by kesarito - opened

Hi!

Great job with your model! We are trying to build a very similar project.

What dataset have you used? We are considering to use the korean wikipedia.

Hi,
It's glad to see for such a project like this!

I used various corpus from multiple sources,
which includes KcBERT(https://github.com/Beomi/KcBERT/releases/tag/v2022.3Q) and Korean Wikipedia, and AIHub Text data (https://aihub.or.kr/aihubdata/data/list.do?currMenu=115&topMenu=100&srchDataRealmCode=REALM002) and etc.

I hope this links would help too:

beomi changed discussion status to closed

ํ˜น์‹œ AIHub text data์˜ validation split์ด๋‚˜ test split์„ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๋‚˜์š”?

Sign up or log in to comment