--- license: openrail --- # This repo contains the VPLM Dataset and pretrained checkpoints for RACCooN See also: https://github.com/jaehong31/RACCooN


RACCooN is a versatile and user-friendly video-to-paragraph-to-video generative framework that supports multiple video editing capabilities such as removal, addition, and modification, through a unified pipeline. RACCooN consists of two principal stages: Video-to-Paragraph (V2P) and Paragraph-to-Video (P2V). RACCooN suggests a multi-granular spatiotemporal pooling strategy to generate well-structured video descriptions, capturing both the broad context and object details without requiring complex human annotations, simplifying precise video content editing based on text for users. Our video generative model incorporates auto-generated narratives or instructions to enhance the quality and accuracy of the generated content. It supports the addition of video objects, inpainting, and attribute modification within a unified framework, surpassing existing video editing and inpainting benchmarks.



# Description of VPLM Dataset Multi-Objects Description - Train: RACCooN/VPLM/gt_train.json - Test: RACCooN/VPLM/gt_test.json Single-Object Layout Prediction - Train: RACCooN/VPLM/gt_train_layouts.json - Test: RACCooN/VPLM/gt_test_layouts.json # Description of Model Checkpoints ## V2P Multi-Objects Description - RACCooN/mllm_finetuned/multi_obj_projector.bin Single-Object Description - RACCooN/mllm_finetuned/single_obj_projector.bin Single-Object Layout Prediction - RACCooN/mllm_finetuned/layout_pred_projector.bin ## P2V - RACCooN/unet_finetuned/diffusion_pytorch_model.safetensors