given text, return some image based on CLIP embedding
Image Classifier of various vehicles from fine-tuned model