Transcribe text from video URLs
Generate depth map from a single image
Transcribe and align audio to text
Transcribe audio files into text
Transcribe audio to text with speaker diarization