Process audio and generate text output based on instructions
Calculate memory usage from model configurations
Compare different tokenizers in char-level and byte-level.