training data

#4
by simeneide - opened

Hi, you write in the doc:

Training data
NB-GPT-J-6B was finetuned on NCC, the Norwegian Colossal Corpus, plus other Internet sources like Wikipedia, mC4, and OSCAR.

Is there any news source in these "other" category, and do you have an approximate amount?

Nasjonalbiblioteket AI Lab org

Yes. It is documented as newspapers_online_nb and newspapers_ocr in NCC.

versae changed discussion status to closed

Sign up or log in to comment