This repo contains the VPLM Dataset and pretrained checkpoints for RACCooN

See also: https://github.com/jaehong31/RACCooN




RACCooN is a versatile and user-friendly video-to-paragraph-to-video generative framework that supports multiple video editing capabilities such as removal, addition, and modification, through a unified pipeline. RACCooN consists of two principal stages: Video-to-Paragraph (V2P) and Paragraph-to-Video (P2V).

RACCooN suggests a multi-granular spatiotemporal pooling strategy to generate well-structured video descriptions, capturing both the broad context and object details without requiring complex human annotations, simplifying precise video content editing based on text for users. Our video generative model incorporates auto-generated narratives or instructions to enhance the quality and accuracy of the generated content. It supports the addition of video objects, inpainting, and attribute modification within a unified framework, surpassing existing video editing and inpainting benchmarks.





Description of VPLM Dataset

Multi-Objects Description

  • Train: RACCooN/VPLM/gt_train.json
  • Test: RACCooN/VPLM/gt_test.json

Single-Object Layout Prediction

  • Train: RACCooN/VPLM/gt_train_layouts.json
  • Test: RACCooN/VPLM/gt_test_layouts.json

Description of Model Checkpoints

V2P

Multi-Objects Description

  • RACCooN/mllm_finetuned/multi_obj_projector.bin

Single-Object Description

  • RACCooN/mllm_finetuned/single_obj_projector.bin

Single-Object Layout Prediction

  • RACCooN/mllm_finetuned/layout_pred_projector.bin

P2V

  • RACCooN/unet_finetuned/diffusion_pytorch_model.safetensors
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .