Experiment with and compare different tokenizers
Generate and convert speech using text and audio inputs
Tag images with labels