How to generate video from text
#2
by
Peterkkk
- opened
I see the demo and example codes showed how to generate image from text.
Could the current model and lib interface support generating video from text? How should I call the interface if could? Thanks!
And more, could you give some code example to call Emu2-Gen with multiple GPUs?
this is my demo running Emu2-Gen on 2 GPUs:
device_map = infer_auto_device_map(model, max_memory={0:'30GiB',1:'80GiB',}, no_split_module_classes=['Block', 'LlamaDecoderLayer'])
device_map["model.decoder.lm.lm_head"] = 0
model = load_checkpoint_and_dispatch(
model,
f'{path}/multimodal_encoder',
device_map=device_map
).eval()
with init_empty_weights():
pipe = DiffusionPipeline.from_pretrained(
path,
custom_pipeline="pipeline_emu2_gen",
torch_dtype=torch.bfloat16,
use_safetensors=True,
variant="bf16",
multimodal_encoder=model,
tokenizer=tokenizer
)
pipe.safety_checker.to("cuda:0")
pipe.unet.to("cuda:0")
pipe.vae.to("cuda:0")
The rest is the same as the code provided by the author.