AI & ML interests

None defined yet.

Organization Card
About org cards

Interaction

Data-Juicer is A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs!

Data-Juicer 是一个一站式数据处理系统,可以使数据质量更高、更丰富、更易被大语言模型"消化"!

News

  • [2024-02-20] We have actively maintained an awesome list of LLM-Data, welcome to visit and contribute!
  • [2024-02-05] Our paper has been accepted by SIGMOD'24 industrial track!
  • [2024-01-10] Discover new horizons in "Data Mixture"—Our second data-centric LLM competition has kicked off! Please visit the competition's official website for more information.
  • [2024-01-05] We release Data-Juicer v0.1.3 now! In this new version, we support more Python versions (3.7-3.10), and support multimodal dataset converting/processing (Including texts, images, and audios. More modalities will be supported in the future). Besides, our paper is also updated to v3.