Create images from various types of annotations
Transform video frames using text instructions
Generate audio from text