Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development

Project description

The emergence of large-scale multi-modal generative models has drastically advanced artificial intelligence, introducing unprecedented levels of performance and functionality. However, optimizing these models remains challenging due to historically isolated paths of model-centric and data-centric developments, leading to suboptimal outcomes and inefficient resource utilization. In response, we present a novel sandbox suite tailored for integrated data-model co-development. This sandbox provides a comprehensive experimental platform, enabling rapid iteration and insight-driven refinement of both data and models. Our proposed "Probe-Analyze-Refine" workflow, validated through applications on T2V-Turbo and achieve a new state-of-the-art on VBench leaderboard with 1.52% improvement from T2V-Turbo based on our previous state-of-the-art model Data-Juicer (T2V, 147k). Our experiment code and dataset are released at Data-Juicer Sandbox.

Model description πŸš€

This repository includes the unet_lora.pt file, which can transform VideoCrafter2 into our Data-Juicer-T2V. Please refer to the code in T2V-Turbo to utilize our model effectively.

Guidelines on Inappropriate and Prohibited Use 🚫

This model is intended solely for research and educational purposes.

  • Generating content that could be considered insulting or detrimental to individuals or their surroundings, including their culture, religion, etc., is strictly forbidden.
  • The creation of content that is pornographic, violent, or graphically disturbing is not allowed.
  • Users must not use the model to produce incorrect or misleading information.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .