Availabity of training data

#2
by rpowalski - opened

Dear authors,

Thank you for your incredible work!
You mentioned in the abstract that you make datasets publicly available. Can you please post the details about how to obtain these?

AI4Chem org

Thanks for your comment! We will process and share it in 🤗Huggingface Hub after this work's publication.

AI4Chem org

Update, We have temporaly finished data cleaning and updated our model.
You can have a test for new model at https://chemllm.org/
Waiting for your feedbacks!

AI4Chem org

ChemLLM datasets is all open source now!
https://huggingface.co/papers/2402.06852
700K of SFT Dataset, ChemData700K For Chemistry of LLM!
https://huggingface.co/datasets/AI4Chem/ChemData700K
10K of DPO Dataset, ChemPref-10K, both English and Chinese!
https://huggingface.co/datasets/AI4Chem/ChemPref-DPO-for-Chemistry-data-en
https://huggingface.co/datasets/AI4Chem/ChemPref-DPO-for-Chemistry-data-cn
ChemBench-4K of 4100 high-quality single-choice benchmark for nine core Chemistry tasks!
https://huggingface.co/datasets/AI4Chem/ChemBench4K
C-MHChem, 600 real test questions written and checked manually, from 25 years of Chinese National Middle school chemistry Test!
https://huggingface.co/datasets/AI4Chem/C-MHChem-Benchmark-Chinese-Middle-high-school-Chemistry-Test
All hail to Open-source community!🤗

Sign up or log in to comment