DeepL Datasets

#1
by Villekom - opened

Hi!

Will the DeepL translated Ultrachat, boolq, Capybara and ai2_arc datasets be made publicly available somewhere?

Finnish-NLP org

Hi Ville,
I Created small descriptions for those datasets and those should be now public.
Please let us know if you find something odd in those datasets.

Also if you at TurkuNLP have any new good datasets to share we would be pleasant :)
Any datasets in huggingface that you would like us to translate?
Also how is Avoin avustaja going? Any bigger marketing you are going to make?

Thanks for making the datasets public! :)

All of the instruction datasets we currently use are in the TurkuNLP github.
I think we would appreciate more of high quality preference data, like ultrafeedback or https://huggingface.co/datasets/Anthropic/hh-rlhf to be translated.

Avoin Avustaja is growing at a modest pace. Right now there is about 138 users and 303 messages have been provided. Any help with the marketing of this project is greatly appreciated!

RASMUS changed discussion status to closed

Sign up or log in to comment