Annotate images with automatic detection and segmentation
Transcribe audio from microphone, files, or YouTube