Interaction
Data-Juicer is A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs!
Data-Juicer 是一个一站式数据处理系统,可以使数据质量更高、更丰富、更易被大语言模型"消化"!
News
- [2024-02-20] We have actively maintained an awesome list of LLM-Data, welcome to visit and contribute!
- [2024-02-05] Our paper has been accepted by SIGMOD'24 industrial track!
- [2024-01-10] Discover new horizons in "Data Mixture"—Our second data-centric LLM competition has kicked off! Please visit the competition's official website for more information.
- [2024-01-05] We release Data-Juicer v0.1.3 now!
In this new version, we support more Python versions (3.7-3.10), and support multimodal dataset converting/processing (Including texts, images, and audios. More modalities will be supported in the future).
Besides, our paper is also updated to v3.